<?xml version="1.0" encoding="UTF-8"?>
<itemContainer xmlns="http://omeka.org/schemas/omeka-xml/v5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://omeka.org/schemas/omeka-xml/v5 http://omeka.org/schemas/omeka-xml/v5/omeka-xml-5-0.xsd" uri="https://omeka.ibu.edu.ba/items/browse?output=omeka-xml&amp;page=62&amp;sort_field=Dublin+Core%2CTitle" accessDate="2026-06-12T08:52:17+01:00">
  <miscellaneousContainer>
    <pagination>
      <pageNumber>62</pageNumber>
      <perPage>10</perPage>
      <totalResults>3494</totalResults>
    </pagination>
  </miscellaneousContainer>
  <item itemId="634" public="1" featured="0">
    <fileContainer>
      <file fileId="627">
        <src>https://omeka.ibu.edu.ba/files/original/730719527e5578fd3d43ba30f199617d.pdf</src>
        <authentication>3d79986ea43502148cc1dca1e5cfe0dc</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="5008">
                    <text>PROCEEDINGS

th

______ The 5 International Symposium on Sustainable Development_______

ISSD 2014

COMPARISON OF CODON USAGE IN MITOCHONDRIAL GENOMES OF
RHINOLOPHID AND HIPPOSIDERID BATS
Semir Dorić1 and Lada Lukić Bilela1,2#
1

Dept. of Biology, Faculty of Science, University of Sarajevo, Zmaja od Bosne 33-35, 71 000
Sarajevo, Bosnia and Herzegovina; semir.doric@gmail.com; llbilela@gmail.com
2
Biospeleological Society in Bosnia and Herzegovina, Avde Jabučice 30, 71 000 Sarajevo,
Bosnia and Herzegovina; biospeld2008@gmail.com
# Corresponding author

Abstract
According to current phylogenetic hypotheses, the bats of the families Rhinolophidae and
Hipposideridae are sister groups nested within the clade of Pteropodiformes. The
Hipposideridae are family of bats commonly known as the Old World leaf-nose bats. While
this family has long been considered as a rhinolophid subfamily Hipposiderinae, it is now
more generally classified as its own family. The Hipposideridae contain 10 living genera and
more than 70 species, mostly in the widespread genus Hipposideros. This study is an attempt
to confirm a distinction between these two families by a codon usage comparison of a
complete set of mitochondrial protein-coding genes from currently available mitochondrial
(mt) genomes of rhinolophid and hiposiderid bats. The INCA 2.1 and GCUA 2.0 were used
for the codon usage computing. Measure Independent of Length and Composition (MILC),
was used to estimate the codon usage of 13 mt protein-coding genes from five species of
genus Rhinolophus and one species of Hipposideros (while only four genes were available
from H. larvatus). Large randomly generated sequence sets were used to test for dependence
on (i) sequence length, (ii) overall amount of codon bias and (iii) codon bias discrepancy in
the sequences. Our findings suggest no significant differences in codon usage bias, among
analyzed rhinolophid species, by statistical estimation of absolute frequency values despite
the changed MILC values for nd1 and nd3 from Hipposideros armiger.
Keywords: MILC, MELP, bats, codon usage, codon frequencies

39 | P a g e

�ISSD 2014

th

The 5 International Symposium on Sustainable Development_______

PROCEEDINGS

1. Introduction
Rhinolophidae split from their sister family, the Hipposideridae, towards the end of the
Eocene (Maree and Grant, 1997; McKenna and Bell, 1997; Teeling et al., 2003; Eick et al.,
2005). This estimate of divergence is congruent with the fossil data, with fossils of extinct
Rhinolophus and Hipposideros species first occurring in middle Eocene deposits (ca. 49–37
MYA; Simmons and Geisler, 1998). The family Rhinolophidae Gray, 1825 consists of a
single genus Rhinolophus Lacépéde, 1799. The taxon is exclusively Old World, with at least
77 species (Simmons, 2005) occurring in both temperate and tropical areas throughout the
Afrotropical, Australian, Indomalayan, Oceanian and Palaearctic regions (Csorba et al., 2003).
A previous classifications imply two subfamilies, the Hipposiderinae and the Rhinolophinae,
of the family Rhinolophidae according to Koopman, 1993, 1994; McKenna and Bell, 1997;
Simmons and Geisler, 1998; Teeling et al., 2002, while contemporaneously the hipposiderids
excluded from this family following Corbet and Hill (1992), Bates and Harrison (1997) and
Simmons (2005). Interestingly, Rhynolophus monoceros (used in our study) is often treated
as a Taiwanese endemic, and very similar to R. pusillus of the mainland in terms of body size,
echolocation call frequency and mitochondrial gene sequences (Li et al. 2006). It is perhaps
best treated as a synonym of R. pusillus, especially given that the Taiwanese Hipposideros
terasensis considered synonymous with H. armiger of the mainland according to Simmons
(2005).
Besides, in more recent studies of deep rhinolophid phylogeny based on analysis of
cytochrome b and the three nuclear introns: thyrotropin, thyroglobulin and protein-kinase
(PRKC1) the hipposiderid bats were used as outgroup (Stoffberg et al., 2010) A
multidisciplinary approach: morphometric measurements (Bogdanowicz, 1992), recording
and analysis of echolocation signals, karyotypic variation (Koubínová et al, 2010), D-loop
sequence analysis) contribute in resolving t1he correct phylogenetic position of the species
within these two families (Stoffberg et al, 2010). According to current phylogenetic
hypotheses, the bats of the families Rhinolophidae and Hipposideridae are sister groups
nested within the clade of Pteropodiformes (Koubínová et al, 2010). Rhinolophidae form a
monophyletic group and can be divided into at least two major clades – the predominantly
African and the predominantly Oriental clades – based on the current biogeographical
distributions of the majority of species within each clade. Morphological (Bogdanowicz, 1992)
and cytochrome b (Guillén-Servent et al., 2003) analyses also suggest that the African
rhinolophids form a monophyletic clade. The typical metazoan mitochondrial (mt) genome
comprises a single circular, double-stranded DNA molecule with a size between 14 and 18 kb
that contains a uniform set of 37 genes (Boore, 1999). Mitochondrial genomes are powerful
tool in phylogenetic analyses to elucidate the complex relationships among taxa. More rapidly
evolving mitochondrial genes may distinguish even closely related species and thus they have
been employed in conservation genetic studies (Avise, 1995). Thus, generation of full mtgenome sequences is important for both evolutionary studies and conservation management of
endangered species.
The aim of this study was a codon usage comparison of mitochondrial genes and
identification of possible differences in codon frequency values from rhinolophid and
hipposiderid bats as a contribution to phylogeny elucidation between these two families.
2. Materials and methods
The nucleotide sequences of 13 (nd1, nd2, nd3, nd4, nd4l, nd5, nd6, cox1, cox2, cytb, cox3,
atp6, atp8) mitochondrial protein-coding genes from five species of the family Rhinolophidae
and Hipposideros armiger (Hodgson, 1835) as well as four genes (nd1, nd2, cox1, cytb) from
H. larvatus (Horsfield, 1823) were obtained from GenBank (NCBI) (Table 1).
40 | P a g e

�PROCEEDINGS

th

______ The 5 International Symposium on Sustainable Development_______

ISSD 2014

Table 1. Selected species of the families Rhinolophidae and Hipposideridae used for
sequence analysis, with a mitochondrial genome accession number in GenBank database.
Selected species

Accession number

Rhinolophus ferrumequinum (Schreber, 1774)

NC_020326.1

R. pumilus K. Andersen, 1905

NC_005434.1

R. monoceros K. Andersen, 1905

NC_005433.1

R. formosae (Sanborn, 1939)

NC_011304.1

R. luctus (Temminck, 1834)

NC_018539.1

Hipposideros armiger (Hodgson, 1835)
H. larvatus (Horsfield, 1823)
H. larvatus (nd1)
H. larvatus (nd2)
H. larvatus (cox1)
H. larvatus (cytb)

NC_018540.1
JX861075.1
DQ888653.1
JQ915493.1
JQ365642.1
EU434949.1

INCA 2.1 (Supek and Vlahoviček, 2006) and GCUA 2.0 (Fuhrmann et. all., 2004) were used
for codon usage computing. Measure Independent of Length and Composition (MILC), was
used to estimate the codon usage of thirteen mtDNA genes of selected species within genera
Rhinolophus and Hipposideros. Large randomly generated sequence sets were used to test for
dependence on (i) sequence length, (ii) overall amount of codon bias and (iii) codon bias
discrepancy in the sequences. MILC Based Expression Level Predictor (MELP) was used as a
measurement to quantitatively predict the levels of selected mitochondrial gene expression.
3. Results and discussion
Calculations of MILC and MELP values for thirteen and four protein-coding genes of selected
species, six and seven respectively, were carried out in INCA 2.1. (Table 2). Reference MILC
values were used as absolute frequencies of codon usage for selected genes (Fo). Estimation of
the percentage difference of codon usage bias among selected species followed trough χ2
statistical observations as well as GCUA 2.0.
Table 2. MILC and MELP average values for 13 (from five rhinolophid and one hipposiderid
bat) (A) as well as four protein coding genes from Hipposideros larvatus (B).
Selected species
Rhinolophus ferrumequinum
R. pumilus
R. monoceros
R. formosae
R. luctus
Hipposideros armiger

MILC
0.631
0.627
0.635
0.615
0.633
0.666

MELP
1.011
1.009
1.015
1.030
1.021
1.007

A

Selected species
Rhinolophus ferrumequinum
R. pumilus
R. monoceros
R. formosae
R. luctus
Hipposideros armiger
Hipposideros larvatus

MILC
0.516
0.525
0.527
0.511
0.527
0.509
0.522

MELP
1.057
1.050
1.054
1.043
1.043
1.030
1.022

B

41 | P a g e

�ISSD 2014

th

The 5 International Symposium on Sustainable Development_______

PROCEEDINGS

Absolute differences between the theoretical (Ft) and observed (Fo) values of codon
frequencies were less than 1%, comparing thirteen and four protein-coding genes from six and
seven selected species, respectively (Graph. 1).
Graph 1. Absolute difference of Ft and Fo for MILC values for 13 (from five rhinolophid and
one hipposiderid bat) (A) and four protein-coding genes from Hipposideros larvatus (B).

42 | P a g e

�PROCEEDINGS

th

______ The 5 International Symposium on Sustainable Development_______

ISSD 2014

The obtained value of χ2 was 0.00218 with p=0.01 and n=5 (six species) and 0.0096 with p=
0.009 and n=6 (seven species). In both cases p value was &lt;0.5 which indicates no significant
difference. However, analysis mediated by GCUA 2.0 revealed important differences in
codon usage frequencies. Absolute differences in codon usage frequency between
Hipposideros armiger and Rhinolophus ferrumequinum were 8.19% , H. armiger and R.
monoceros 8.17%, H. armiger and R. pumilus 9.89%, H. armiger and R. luctus 8.3%.
The typical deviation from the universal genetic code observed in mitochondrial genomes of
rhinolophid and hipposiderid bats are similar to that already found in other vertebrates, with
TGA coding for tryptophan, instead of being a stop codon. Hipposideros armiger prefers
TGG (Trp) with the absolute frequency of 0.154 related to five species of genus Rhinolophus
with frequencies of 0.05-0.65.
Among the terminations codons, UAA was the most preferred by all analyzed species, then
AGA while the codon AGG is found neither in mitochondrial genes of five rhinolophid
species nor in H. armiger and H. larvatus. AGA and AGG were thought to have become
mitochondrial stop codons early in vertebrate evolution (Osawa, et al., 1989). However, at
least in humans it has now been shown that AGA and AGG sequences are not recognized as
termination codons. UAG codon is rather preferred by H. armiger than rhinolophid bats with
a difference of 4.04%. Actually, the UGA Stop-to-Trp is the change is the most frequently
occurring reassignment known. Disappearance of UAG would be favored by mutation
pressure increasing the AU content. The reason UAG is reassigned less frequently than UGA
may be because of the relative difficulty of the required change in the tRNA. In the case of
UGA, the existing tRNA-Trp can simply mutate its anticodon (Sengupta et al., 2007).
Synonymous codons are not used with equal frequencies, so based on a multivariate analysis
of codon usage data from unicellular organisms, Grantham et al. (1980) proposed the genome
hypothesis implying some relationship between codon usage and taxonomic distance. A long
time ago was noticed a correlation between taxonomic divergence and the similarity of the
codon dialect (Ikemura, 1985; Maruyama et al. 1986).
4. Concluding remarks
During the last decade, analyses of the mitochondrial genome became a powerful tool to
resolve the phylogenetic relationships among the various eukaryotic lineages and to elucidate
the early events during evolution of multicellularity. MILC and MELP algorithms have
proved to be excellent tools for mitogenomic phylogeny and molecular evolution studies.
The codon usage comparison of 13 mt protein-coding genes from five species of genus
Rhinolophus and one species of Hipposideros has shown 8.17 to 9.89% differences of codon
frequency values using GCUA 2.0. However, our results mediated by INCA 2.1., without any
statistically significant difference in codon usage among selected species could be explained
by the estimation based on the only one complete mitochondrial protein-coding gene set (H.
armiger) and four genes (from H. larvatus) from hipposiderid bats. Furthermore, MILC
algorithm has proved to be very sensitive, discriminating genes by their length, alternative
stop codons and nucleotide composition. MILC values for nd1 and nd3 genes from
hipposiderid bats differ from genes of selected rhinolophid species due to the nucleotide
composition. This could indicate possible further differences on complete mtDNA sequences
which can help in elucidation of phylogenetic relationships within these two chiropteran
families.

43 | P a g e

�ISSD 2014

th

The 5 International Symposium on Sustainable Development_______

PROCEEDINGS

5. References
Agnarsson, I., Zambrana-Torrelio, C., M., Flores-Saldana, N., P., &amp; May-Collado, LJ. (2011). A time-calibrated
species-level phylogeny of bats (Chiroptera, Mammalia). PLOS Currents, 10, 13-71.
Avise, J.C. (1995). Mitochondrial DNA polymorphism and a connection between genetics and demography of
relevance to conservation. Conservation Biology, 9, 686–690.
Bogdanowicz, W. (1992). Phenetic relationships among bats of the family Rhinolophidae. Acta Theriologica, 37,
213–240.
Boore, J.L. (1999). Animal mitochondrial genomes. Nucleic Acids Research, 27, 1767–1780.
Csorba, G., Ujhelyi, P., &amp; Thomas, N. (2003). Horseshoe Bats of the World (Chiroptera: Rhinolophidae). Alana
Books, Shropshire. p. 160.
Eick, G.N., Jacobs, D.S., &amp; Matthee, C.A. (2005). A nuclear DNA phylogenetic perspective on the evolution of
echolocation and historical biogeography of extant bats (Chiroptera). Molecular Biology and Evolution, 22,
1869–1886.
Fuhrmann, M., Hausherr, A., Ferbitz, L., Schödl, T., Heitzer, M., &amp; Hegemann, P. (2004). Monitoring dynamic
expression of nuclear genes in Chlamydomonas reinhardtii by using a synthetic luciferase reporter gene. Plant
Molecular Biologie, 55(6), 869-81.
Grantham, R., Gautier, C., Gouy, M., Mercier, R., &amp; Pave, A. (1980). Codon catalog usage and the genome
hypothesis. Nucleic Acids Research, 8, 49-62.
Guillén-Servent, A., Francis, C.M., &amp; Ricklefs, R.E. (2003). Phylogeny and biogeography of the horseshoe bats.
In: Csorba, G., Ujhelyi, P., &amp; Thomas, N. (Eds.), Horseshoe Bats of the World. Alana Books, p. 160.
Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicelular organisms. Molecular
Biology and Evolution, 2, 13-34.
Koubínová, D., Sreepada, K. S., Koubek, P., &amp; Zima, J. (2010). Karyotypic Variation in Rhinolophid and
Hipposiderid Bats (Chiroptera: Rhinolophidae, Hipposideridae). Acta Chiropterologica, 12(2), 393-400.
Li, G., Jones, G., Rossiter, S.J., Chen, S.F., Parsons, S. &amp; Zhang, S. (2006). Phylogenetics of small horseshoe
bats from East Asia based on mitochondrial DNA sequence variation. Journal of Mammalogy, 87, 1234-1240.
Maree, S., &amp; Grant, W.S. (1997). Origins of horseshoe bats (Rhinolophus, Rhinolophidae) in southern Africa:
evidence from allozyme variability. Journal of Mammalian Evolution, 4, 195–214.
Maruyama, T., Gojobori, T., Aota, S.I., &amp; Ikemura, T. (1986). Codon usage tabulated from the GenBank genetic
sequence data. Nucleic Acids Research, 14, l51-197.
McKenna, M.C., &amp; Bell, S.K. (1997). Classification of Mammals above the Species Level. Columbia University
Press. p. 631.
Osawa, S., Ohama, T., Jukes, T.H., &amp; Watanabe, K. (1989). Evolution of the mitochondrial genetic code. I.
Origin of AGR serine and stop codons in metazoan mitochondria. 29(3), 202-207.
Sengupta, S., Yang, X., &amp; Higgs, P. G. (2007). The Mechanisms of Codon Reassignments in Mitochondrial
Genetic Codes. Journal of Molecular Evolution, 64, 662-688.
Simmons, N.B., &amp; Geisler, J.H. (1998). Phylogenetic relationships of Icaronycteris, Archaeonyteris,
Hassianycteris and Palaeochiropteryx to extant bat lineages, with comments on the evolution of echolocation and
foraging strategies in Microchiroptera. The Bulletin of the American Museum of Natural History, 235, 1–182.
Simmons, N.B. (2005). Order Chiroptera. In: Wilson, D.E., &amp; Reeder, D.M. (Eds.), Mammal Species of the
World: A Taxonomic and Geographic Reference, third ed. Smithsonian Institution Press, pp. 312–529.
Stoffberg, S., Jacobs, D.S., Mackie, I.J., &amp; Matthee, C.A. (2010). Molecular phylogenetics and historical
biogeography of Rhinolophus bats. Molecular Phylogenetics and Evolution, 54, 1–9.

44 | P a g e

�PROCEEDINGS

th

______ The 5 International Symposium on Sustainable Development_______

ISSD 2014

Supek, F., &amp; Vlahoviček, K. (2005). Comparison of codon usage measures and their applicability in prediction
of microbial gene expressivity. BMC Bioinformatics, 6, 182.
Supek, F., &amp; Vlahoviček, K. (2006). INCA: synonymous codon usage analysis and clustering by means of selforganizing maps. BMC Bioinformatics, 20(14), 2329-2330.
Teeling, E.C., Madeson, O., Van Den Bussche, R.A., De Jong, W.W., Stanhope, M.J., &amp; Springer, M.S., (2002).
Microbat paraphyly and the convergent evolution of a key innovation in Old World rhinolophoid microbats.
PNAS USA, 99, 1431–1436.
Teeling, E.C., Madsen, O., Murphy, W.J., Springer, M.S., &amp; O’Brien, S.J. (2003). Nuclear gene sequences
confirm ancient link between New Zealand’s short-tailed bat and South American noctilionoid bats. Molecular
Phylogenetic and Evolution, 28, 308–319.
Vaughan, T., Ryan, J., &amp; Czaplewski, N. (2000). Mammalogy, 4th Edition. Toronto: Brooks Cole Press.
Wilson, D., &amp; Reeder, D. (2005). Mammal Species of the World, 3rd edition. Baltimore: Johns Hopkins
University Press.

Semir Dorić is postgraduate student at Faculty of Science, University of Sarajevo
(Department of biology) with a special interest in field of molecular biology and
bioinformatics. In diploma thesis has analyzed a codon usage in selected representatives of
family Rhinolophidae (Chiroptera) with a dedication to implement different software
packages towards improving the quality of his investigations.
Lada Lukić Bilela is associated professor of Molecular biology and Genomics at Faculty of
Science, University of Sarajevo (Department of biology). She received her BSc in Biology
(Faculty of Science, Sarajevo) MS and PhD in Mitochondrial genomics of Porifera at (Faculty
of Science, Zagreb; field: molecular and cell biology). She was employed in Ruđer Bošković
Institute at Laboratory of molecular genetics and performed her postdoctoral training at
Johannes Gutenberg University of Mainz (Marie Curries Training Network fellowship).
45 | P a g e

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="5000">
                <text>2451</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="5001">
                <text>COMPARISON OF CODON USAGE IN MITOCHONDRIAL GENOMES OF  RHINOLOPHID AND HIPPOSIDERID BATS</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="5002">
                <text>DORIĆ, Semir
LUKIĆ BILELA, Lada</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="5003">
                <text>According to current phylogenetic hypotheses, the bats of the families Rhinolophidae and  Hipposideridae are sister groups nested within the clade of Pteropodiformes. The  Hipposideridae are family of bats commonly known as the Old World leaf-nose bats. While  this family has long been considered as a rhinolophid subfamily Hipposiderinae, it is now  more generally classified as its own family. The Hipposideridae contain 10 living genera and  more than 70 species, mostly in the widespread genus Hipposideros. This study is an attempt  to confirm a distinction between these two families by a codon usage comparison of a  complete set of mitochondrial protein-coding genes from currently available mitochondrial  (mt) genomes of rhinolophid and hiposiderid bats. The INCA 2.1 and GCUA 2.0 were used  for the codon usage computing. Measure Independent of Length and Composition (MILC),  was used to estimate the codon usage of 13 mt protein-coding genes from five species of  genus Rhinolophus and one species of Hipposideros (while only four genes were available  from H. larvatus). Large randomly generated sequence sets were used to test for dependence  on (i) sequence length, (ii) overall amount of codon bias and (iii) codon bias discrepancy in  the sequences. Our findings suggest no significant differences in codon usage bias, among  analyzed rhinolophid species, by statistical estimation of absolute frequency values despite  the changed MILC values for nd1 and nd3 from Hipposideros armiger.  Keywords: MILC, MELP, bats, codon usage, codon frequencies</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="45">
            <name>Publisher</name>
            <description>An entity responsible for making the resource available</description>
            <elementTextContainer>
              <elementText elementTextId="5004">
                <text>International Burch University</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="5005">
                <text>2014-05-15</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="5006">
                <text>Article
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="5007">
                <text>ISSN 978-9958-834-36-3     </text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="76">
        <name>Q Science (General),QH301 Biology,QH426 Genetics</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="3113" public="1" featured="0">
    <fileContainer>
      <file fileId="3881">
        <src>https://omeka.ibu.edu.ba/files/original/966fa15171b70b16a0139ddd3f08f798.pdf</src>
        <authentication>5bc17b869bc4211fc4b4cccd882de8f0</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="23980">
                    <text>Comparison of Decision Tree Methods for Intrusion Detection
Fatih Ozturk,
fozturk@ibu.edu.ba
Abdulhamit Subasi
International Burch University, Faculty of Engineering and Information Technologies,
71000, Sarajevo, Bosnia and Herzegovina.
asubasi@ibu.edu.ba
Abstract: The popularity of using Internet contains some risks of network attacks,
and attack methods differ each day, thus information security problem has become a
significant issue all over the world. Intrusion detection is one major research
problem in network security, whose aim is to identify unusual access or attacks to
secure internal networks. At the moment, it is an urgent need to detect, identify and
prevent such attacks effectively. In this work, we compared efficiency of decision
tree methods in intrusion detection system. We compared the accuracy, detection
rate, false alarm rate for different attack types.
Keywords: Decision Tree; CART (Classification and Regression Trees); ID3; C4.5;

Random Forest; Internet attack; Intrusion detection system (IDS).

1. Introduction
In today’s business oriented world information is very important. It is even considered as an intangible asset.
The fastest way to bring the necessary information to end users is via Internet. Internet has been so deeply
involved in our lives in a way that we have been dependent on it. It has even been used as an important
component of business models (Shon &amp; Moon, 2007). The internet has been used to bring the customers and the
businesses together by applications such as websites and emails. This brings up a very important concern, the
information security. Intrusion detection is one of the major research topics to prevent attacks from the internet
for both companies and the end users. Firewalls may protect the networks but day by day the attacks became
more complicated. Intrusion detection systems (IDSs) overcome this complexities providing ways to resist
different types of suspicious network communications and computing habits. All of this is done assuming that
the behaviour of intruders differs from an authenticated user (Stallings, 2006) (Tsai, et al. 2009).
Generally IDSs divide into two main categories based on their detection methods: anomaly and misuse
(signature) detection (Anderson, 1995) (Rhodes, Mahaffey, &amp; Cannady, 2000). Deviation from normal operation
can be flagged as intrusion with anomaly detection. Meanwhile the use of well-known attacks to the system can
be caught by misuse detection (Tsai, et al. 2009). Many issues should be considered when building an IDS, like
data collection, data pre-processing, intrusion recognition, reporting and response. The most important of these
components is intrusion recognition (Wu, Banzhaf, 2009).
In literature, since Denning first proposed the intrusion detection model in 1987 (Denning, 1987), different
machine learning techniques are used to develop anomaly and misuse detection systems. Especially classifiers
are used to detect whether a connection over internet is normal use or an attack (Wu, Banzhaf, 2009). Between
late 80s and early 90s combination of expert systems and statistical methods were popular. Detection models
have been acquired from field experts. From mid to late 90s normal abnormal detection moved to automatic
modelling. Artificial intelligence (AI) and machine learning helped automation via a test data. Rule based
induction, classification and data clustering highly referenced. Since IDS have huge network traffic volumes,
highly diverse data distribution and difficult decision boundary to construct a good model is very challenging
(Wu, Banzhaf, 2009). In this work, we compared different decision tree methods for intrusion detection using
KDD’99 dataset.

401

�This paper is organized as follows. Section 2 provides some literature review on IDS systems. Section 3
describes the data set and performance evaluation used in this paper. Section 4 gives brief theoretical background
for different decision tree algorithms. Section 5 gives our analysis results and we conclude in Section 6.

2. Literature review on intrusion detection system
The IDS was first mentioned in Anderson’s technical report (Anderson, 1980) where he mentioned the use of
statistical methods to analyse users’ behaviour to detect the misuse of the system. In 1987 Dorothy laid the
foundations of Intrusion detection models. He proposed a prototype IDS: IDES (Intrusion Detection Expert
Systems) (Denning, 1987). After this a number of IDS has been released such as Discovery, Haysack, MIDAS,
NADIR, NSM, Wisdom and sense, DIDS, etc (Bace, 2002). (Wu, Yen, 2009).
We can classify IDSs into two major areas: Misuse Detection and Anomaly detection (Bace, 2002). In misuse
detection we try to match incoming data with pre-defined intrusive behaviour (signature). So, the well known
intrusions are detected very fast and accurately (low false alarm rates). For this reason this method is adopted by
many commercial IDSs. However the intrusion methods are not trivial and will evolve continuously. If an
unknown attack comes to the system it will fail. To overcome this shortage we have to update the signature
database at the expense of our valuable time (Wu, Yen, 2009).
The second area, anomaly detection (Denning, 1987), may overcome this problem. The anomaly detection relies
on the fact that most of the network communications are normal data and it tries to model this normal behaviour.
Anything off this behaviour is marked as anomaly and a flag is raised by the system. Since the intrusions are rare
and different from the normal data this method catches most of the abnormal behaviour even with unknown
attacks. The difficulty here is the false alarm rates. The boundary between the normal and abnormal data is often
too close (Wu, Banzhaf, 2009). Another problem faced is the constant changing of the normal usage statistics.
For example we see more and more UDP connections since people use video sharing sites more frequently.
There is also a hybrid method introduced by MINDS (Ertoz et al., 2004), EMERALD, Prelude, etc. where they
try to leverage the disadvantages of both of these methods (Wu, Yen, 2009).

3. Datasets and performance evaluation
The data we have used in the testing has been derived from well known KDD’99 datasets. This data is collected
in 1998 by MIT Lincoln laboratory which has seven weeks of training and two weeks of test data. The test UNIX
and Windows NT hosts and a Cisco router have faced more than 300 instances of 38 attack types. KDD’99
dataset is derived from this dataset in 1999 by assembling individual TCP packets into TCP connections. This
data was the benchmark dataset used in the International Knowledge Discovery and Data Mining Tools
competition (Tsai, et al. 2009).
Each TCP connection has 41 features with a label specifying the status of the connection. The features consist of
38 numeric and 3 symbolic features falling into one of the following categories:
1- Basic features: 9 basic features to describe each individual TCP connection.
2- Content features: 13 domain knowledge related feature to indicate suspicious behaviour without any
sequential patterns.
3- Time-based traffic features: 9 features used to summarize the connections in the past 2 s that had the
same destination host or the same service as the current connection.
4- Host-based traffic features: 10 features were constructed using a window of 100 connections to the
same host instead of a time window to see the attacks that might take more than 2 s. (Tsai, et al. 2009).
In KDD’99 dataset we have 4,940,000 data instances covering the normal and 24 network attacks. The test set
has 311,029 data instances with a total of 38 attacks, 14 of which do not appear in the training set. Because of
this large training set, usually another dataset with the 10% of the data is used (Wu, Banzhaf, 2009).
The attacks contained in test information can be separated into the following categories:
Probe: These are not considered as real attacks but often used as preparation steps for a full scale attack.
They are used to investigate the end system for future attacks.

402

�Dos (Denial of service): This type of attack usually keeps server busy or uses the valuable bandwidth so that
it will not provide the necessary service to the end users. Most common types are SYN Flooding, Ping
Flooding and etc.
U2R (User to Root): In this type the attacker tries to take control of the admin user of the system from the
leaks in the system. Buffer overflow is one of these methods.
R2L (Remote to Local): It is used to take advantage of the server providing the services in order to get
sensitive security information on the server or user personal files. Unicode leak, SQL injection and etc. are
examples of this attack type (Yen, Wu, 2009).
In our experiments we have selected a subset of 7000 instances to train our data. This subset has 6 different kinds
of attacks and 1 normal data. The selected attack types are considered the most widely used types which include
"port-sweep", "back-door", "IP-sweep", "nmap atack", "satan" and "smurf".
The best method to evaluate the effectiveness of an IDS is the correct prediction ability. There are four possible
outcomes according to the real nature of a sample data compared to the outcome (prediction) from the IDS. This
is also known as the truth matrix.
True negative (TN): The amount of normal data predicted when it is really normal,
True positive rate (TP): The amount of attack predicted when it is really attack
False negative rate (FN): The amount of normal predicted when it is really attack
False positive rate (FP): The amount of attack predicted when it is really normal.
TN/(TN+FP) gives specificity or False alarm rate (FAR),
TP/(TP+FN) gives sensitivity or detection rate (DR),
The most widely used performance evaluation are Detection Rate with False Alarm Rate. A good IDS must have
high DR and a low or zero FAR (Wu, Yen, 2009) (Wu, Banzhaf, 2009).

4. Decision trees
In this paper we have compared four different Decision Tree methods. In a decision tree we try to classify a
sample through some decisions leading us to a succeeding decision. The classification takes places from the root
node to the leaf where the end leaf has the category information. Each node holds one attribute of the tree and
the branches corresponds to the value of that attribute. (Mitchell, 1997). It is pretty much similar to flow chart
structure; the test properties are internal nodes, test results corresponds to each branches, and distribution
situation of various types are nodes of leaves. Each decision tree will belong to one of these categories: top-down
tree construction or bottom-up pruning. The most essential and common method used for classification tree is
CART (Classification and Regression Trees) (Breiman, Friedman, Olshen, &amp; Stone, 1984), ID3 and C4.5
(Quinlan, 1993) and they are all top-down tree construction. The common algorithm can be described as such; (i)
place all training set into the root of classification, (ii) check whether it contains all the same type or an empty
set; if node contains more than one type of training set check each property of data according to certain function,
and select a proper property. Divide training set into N parts, each constituting a new node to the root node
according to the value of the property. This process is called as splitting node. (iii) Check whether each node is a
leaf; if not, split them into new nodes as described in "ii". (iv) Proceed with splitting until each nodes turn into a
leaf. This will construct our tree. In these methods we can totally classify any given training data into branches
and leaves of the tree (Yen, Wu, 2009).
In ID3 method, by definition, all the parameters given should be discrete values. In our sample set we have 3
symbolic features. In order to overcome this problem we have converted these values into discrete valued
properties with a pre-process before evaluating our sample data. Under certain conditions Decision Trees have
advantage over other common supervised learning methods like discriminant analysis. Especially they do not
suffer the same probability distribution restrictions. There is also no necessity to assume linear model and they
are very useful with non-linear predictors (Wu, Banzhaf, 2009).
4.1 CART
In CART (Classification and Regression Trees) there are six general questions to be answered:
1- Should we allow properties to be restricted to binary values or let the multi valued properties exist?
2- Which property should be tested at a node?
3- When do we declare a node as a leaf
403

�4- How to make trees smaller and simpler when it becomes too large? Prune?
5- When we induce an impure leaf node how shall we label it?
6- How can we handle the missing data?
According to these questions we create a tree giving good enough results with easy to compute and fast
responses (Duda, Hart &amp; Stork, 2002).
4.2 ID3
Since this is the third in a series of identification or "ID" process it is called ID3. The inputs were intended to be
used with nominal (unordered) inputs only. When a real-valued variable is present it is first divided into intervals
and then each interval applied as nominal input. Here each split has a branching factor Bj, where it is the number
of discrete attribute container of variable j chosen for splitting. Usually these are not binary and a gain ratio
impurity should be used. The number of levels they have is equal to the number input variables. It continues to
run until all nodes are pure and no more possible splits exist. If necessary a common pruning technique can be
applied to the algorithm (Duda, Hart &amp; Stork, 2002).
4.3 C4.5
This is the successor and refinement of ID3 method, and very popular among classification tree methods. The
real valued variables are handled as in CART. Nominal data is used to crate multi-way splits as in ID3 with a
gain ratio impurity. The pruning is achieved with the statistical significance of the splits. The main difference
between CART and C4.5 is the missing features. There are no substitute splits precomputed. If a defective test
pattern with missing feature is present at branch N with branching factor B, C4.5 follows all possible B answers
to the descendent nodes and at last to the B leaf nodes. The final decision is made according to the labels of B
leaf nodes multiplied by the decision probability of N. Here, unlike CART, we don't exploit statistical
correlations between different features of the training points. There is no extra computation and storage required
for C4.5. So, it is much preferred where the storage is a major concern. As for the pruning, C4.5 generates the
rules from the tree where each leaf node has one associated rule (the route from the root node to that leaf). Then
it deletes the redundant antecedents in these rules (Duda, Hart &amp; Stork, 2002).
4.4 Random forests
Decision Trees can be less accurate than methods like support vector machines and the structure of the trees can
also be unstable (Breiman, 1996). Small changes in the training data may significantly result in changes to the
tree, either to the identity of the split variable or to the value of the split. Even the structure changes significantly
the resulting prediction may stay the same. RandomForest (a trademark of Saldorf Systems) algrithm is
developed by Breiman (Breiman, 2001a,b ) to overcome these shortcomings while possibly enhancing the
interpretability. Some of the important features of RandomForest is: (Breiman and Cutler, 2004b)
- The accuracy that equals or tops many current classifiers without overfitting
- Fast on large databases and handle thousands of predictors without selector routines
- Each predictor importance is estimated
- Generalization error estimate is generated in an unbiased manner
- The missing data and error balance when a class proportion marked differently is dealt with robust algorithms.
- The proximities between the pairs of cases that can be used in clustering and identifying outliers are computed
- Variable interaction detection is done by an experimental method
- Generated forest can be saved to be used on other data.
The forest, generated by many trees, where each tree is different from each other at the generation of training
cases and predictors used at each node. Each tree is generated by a subset (m) of available predictors drawn
randomly, which is much less than the total available. This m value is the only adjustable parameter which the
random forests are sensitive (Breiman and Cutler, 2004a). This parameter is the same for all the trees in forest.
Each tree is allowed to grow to the most possible extend thus; there is no need for pruning. Besides increased
prediction accuracy, they help to determine the importance of each variable and associations between each case
(Fielding, 2007).

404

�5. Results and Discussion
Over the past decade intrusion detection based on machine learning methods has been extensively studied topic,
and they satisfy the growing demand of reliable and intelligent intrusion detection systems. In this study we
compared the performance of different decision tree classifiers on solving intrusion detection problems. This
research works were trained and tested on the KDD’99 dataset. The classification results are shown in Table 1.
From this table, we can easily see that there is no significant difference between accuracy of these methods;
however, random forest is better than others. According to the average, ID3 is worst in the classification.
Accuracy refers to the proportion of attack detected among all attack data, namely, the situation of TP. In
detection rate, the random forest has accuracy 99.94 % approximately. False alarm rate refers to the proportion
that normal data is falsely detected as attack behaviour, namely, the situation of FP. In comparison of false alarm
rate, random forest is 0.1 %, but it is worse than C4.5 and CART. According to the average value, false alarm
rate of C4.5 and CART is 0 % and better than random forest.

Decision Tree Method
ID3
CART
C4.5
Random Forest

Accuracy
97.4%
99.8286 %
99.8857 %
99.9429 %

Table 1: Classification of Different Decision Tree Algorithms
We have considered the problem of comparing different decision tree classifiers, including ID3, CART, C4.5
and random forest. Here, rather than directly comparing typical implementations of CART, ID3, C4.5 and
random forest methods, it is more useful to consider distinctions within the different component steps. Anybody
can build a tree using any practical feature processing, impurity measure, stopping criterion or pruning method.
Of course, if the designer has insight into feature pre-processing, this must be utilized. Generally, pruning can be
preferred over stopped training and cross-validation, because it takes advantage of more of the information in the
training set. On the other hand, pruning large training sets can be computationally expensive. The pruning of
rules is less useful for problems that have high noise and are at base statistical in nature. Similar to the most
classification methods, one gains expertise and insight through experimentation on a wide range of problems.
Any single tree algorithm neither dominates nor is dominated by other classification methods. It can be seen that
trees yield classifiers with accuracy as good as other classification methods. (Duda, Hart &amp; Stork, 2002)
Even though some promising results have been accomplished by different decision tree classifier to IDSs, there
are still challenges that lie ahead for researchers in this area. First and foremost, good benchmark datasets for
network intrusion detection are needed. The KDD’99 is the most important benchmarks used to evaluate the
performance of network intrusion detection systems. But, they are suffering from a serious drawback: failing to
realistically simulate a real-world network (Brugger 2007)(Mahoney, Chan, 2003) (McHugh, 2000). An IDS
working well on these datasets may demonstrate unacceptable performance in real environments. (Wu, Banzhaf,
2009)
These datasets possess some special characteristics, such as huge volume, high dimension and highly skewed
data distribution. As a result, using only these datasets is not adequate to demonstrate the efficiency of a learning
algorithm. It is also meaningful to note that the KDD’99 datasets were collected about 10 years ago. One of the
important characteristics of intrusion detection is the capability of adaptation to continually changing
environments. Not only the intrusive behaviour evolves continuously, but also the legitimate behaviour of users,
systems or networks changes over time. If the IDS is not flexible enough to cope with behavioural changes,
detection accuracy will dramatically decrease. A focus on adaptation in IDSs is highly recommended. Another
challenge to confront in IDS is the huge volume of audit data that makes it difficult to build an effective IDS.
Perhaps it is time to create a new and high-quality dataset for the intrusion detection task. (Wu, Banzhaf, 2009)

6. Conclusion
Intrusion detection based on computational intelligence is currently attracting considerable interest from the
research community. This research compares accuracy, detection rate and false alarm rate of different attacks.
KDD’99 dataset is current benchmark dataset in intrusion detection. For comparison results of decision tree
405

�algorithms, we find that Random forest is superior to others in accuracy and detection; ID3 is the worst. In
comparison of false alarm rate, C4.5 and CART are better than Random forest. Through test and comparison, the
accuracy and detection rate of Random forest is higher than that of others, but false alarm rate of C4.5 and
CART is better; if we combine the two methods, overall accuracy can be increased greatly. Dataset KDD’99
applied in the research is popularly used in current intrusion detection system; however, it is data of 1999, and
network technology and attack methods changes greatly, it cannot reflect real network situation nowadays.
Therefore, if newer information is got and tested and compared refresh, they can more accurately reflect current
network situation.

References:
Anderson, James P. (1980). Computer security threat monitoring and surveillance, technical report, James P. Anderson Co.,
Fort Washington, Pennsylvania.
Anderson, J. (1995). An introduction to neural networks. Cambridge: MIT Press.
Bace, Rebecca G. (2002). NIST special publication on intrusion detection systems.
Breiman, L., Friedman, J. H., Olshen, R. A., &amp; Stone, P. J. (1984). Classification and regressing trees. California: Wadsworth
International Group.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123_40.
Breiman, L. (2001a). Random forests. Machine Learning, 45, 5_32.
Breiman, L. (2001b). Statistical modelling: the two cultures. Statistical Science, 16, 199_215.
Breiman, L. and Cutler, A. (2004a). Interface Workshop: April 2004. Available at http://statwww.berkeley.edu/users/breiman/RandomForests/interface04.pdf (accessed 1 February 2006).
Breiman, L. and Cutler, A. (2004b). Random Forests. Available at http://statwww.berkeley.edu/users/breiman/RandomForests/cc_home.htm (accessed 1 February 2006).
Brugger, T. (2007) KDD cup’99 dataset (network intrusion) considered harmful, 15 September 2007. Retrieved January 26,
2008, from http://www.kdnuggets.com/news/2007/n18/4i.html.
Dorothy, Denning. 1987. An intrusion detection model. IEEE Transaction on Software Engineering.
Intrusion detection by machine learning: A review
Duda R. O.; Hart, P. E. and Stork D. (2002). Pattern Classification, 2nd. Edition, John Wiley &amp; Sons, 2002.
Rhodes, B., Mahaffey, J., &amp; Cannady, J. (2000). Multiple self-organizing maps for intrusion detection. In Paper presented at
the proceedings of the 23rd national information systems security conference. Baltimore, MD.
Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., et al. (2004). The MINDS – minnesota intrusion
detection system. Next generation data mining. MIT Press.
Fielding, Alan H. (2007). Cluster and Classification Techniques for the Biosciences, Cambridge University Press The
Edinburgh Building, Cambridge cb2 2ru, UK, 2007.
Mahoney, M.V.; Chan, P.K. (2003). An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network
anomaly detection. Technical Report TR CS-2003-02, Computer Science Department, Florida Institute of Technology, 2003
McHugh, J. (2000). Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system
evaluations as performed by Lincoln laboratory, ACM Transactions on Information and System Security 3 (4) (2000) 262–
294.
Mitchell, T. (1997). Machine learning. New york: McGraw Hill.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers.
Shon, T., &amp; Moon, J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177,
3799–3821.

406

�Stallings, W. (2006). Cryptography and network security principles and practices. USA: Prentice Hall.
Tsai, Chih-Fong; Hsu, Yu-Feng; Lin, Chia-Ying; Lin, Wei-Yang (2009). Intrusion detection by machine learning: A review.
Expert Systems with Applications, 36 (2009) 11994–12000
Wu, Shelly Xiaonan; Banzhaf, Wolfgang (2009). The use of computational intelligence in intrusion detection systems: A
review. Applied Soft Computing, 10 (2010) 1–35
Wu, Su-Yun; Yen, Ester (2009). Data mining-based intrusion detectors. Expert Systems with Applications, 36 (2009) 5605–
5612

407

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23974">
                <text>535</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23975">
                <text>Comparison of Decision Tree Methods for Intrusion Detection</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="23976">
                <text>Ozturk, Fatih
Subasi, Abdulhamit</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23977">
                <text>The popularity of using Internet contains some risks of network attacks,  and attack methods differ each day, thus information security problem has become a  significant issue all over the world. Intrusion detection is one major research  problem in network security, whose aim is to identify unusual access or attacks to  secure internal networks. At the moment, it is an urgent need to detect, identify and  prevent such attacks effectively. In this work, we compared efficiency of decision  tree methods in intrusion detection system. We compared the accuracy, detection  rate, false alarm rate for different attack types.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23978">
                <text>2010-06</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="23979">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="15">
        <name>Q Science (General)</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="1283" public="1" featured="0">
    <fileContainer>
      <file fileId="1433">
        <src>https://omeka.ibu.edu.ba/files/original/97f80968ecbf84e9fa61397f3c428f07.docx</src>
        <authentication>cae459a3db56f9616e8ce14714225e07</authentication>
      </file>
      <file fileId="1434">
        <src>https://omeka.ibu.edu.ba/files/original/01e92947115f5804971b587bd676a6c2.pdf</src>
        <authentication>dc6ccd9131157e9f3d547dbbbde8e07c</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="10036">
                    <text>COMPARISON OF ELECTROMAGNETIC RADIATION LIMITS FOR
EXTREMELY LOW FREQUENCIES IN EUROPEAN COUNTRIES
Mahmut Yalçın
Istanbul University, Istanbul, Turkey
myalcin@Istanbul.edu.tr
Keywords:Electromagnetic Radiation (EMR); Electromagnetic Pollution; Extremely Low
Frequency (ELF); Exposure Limits; Magnetic Field; High Power Lines.
ABSTRACT
Almost every member of modern societies constantly live in electromagnetic fields (EMF) which
are much higher than those found in nature. Power lines, computer monitors, different electrical
equipments, radio, television, mobile phones, microwave ovens can be given examples of these
EMF sources. Their potential effects of health continue to be the subject of controversy.
Extremely low frequency (ELF, 0-3000 Hz) region of spectrum is radiated by transformers,
household equipments, high power lines, and by electrical goods is investigated in this study.
Since more than 25 years research efforts to find a correlation between the electromagnetic field
and their effects on health of human are going on, but without significant success. Generally,
countries accept the standarts of International Commission on Non-Ionizing Radiation Protection
(ICNIRP), World Health Organization (WHO), and European Committee for Electrotechnical
Standardization (CENELEC). Some countries apply more strict limit values than above
foundations. The International Agency for Research on Cancer (IARC) reviewed EMFs and
cancer in June 2001, and classified magnetic fields as “possibly” carcinogenic for low frequency
region. Therefore, exposure limit values of EMFs are really important. The best way is
acceptance of As Low As Reasonably Achievable (ALARA) principle as long as have not exact
scientific results.

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="10028">
                <text>2131</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="10029">
                <text>COMPARISON OF ELECTROMAGNETIC RADIATION LIMITS FOR EXTREMELY LOW FREQUENCIES IN EUROPEAN COUNTRIES</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="10030">
                <text>YALCIN, Mahmut</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="10031">
                <text>Keywords:Electromagnetic Radiation (EMR); Electromagnetic Pollution; Extremely Low Frequency (ELF); Exposure Limits; Magnetic Field; High Power Lines.  ABSTRACT  Almost every member of modern societies constantly live in electromagnetic fields (EMF) which are much higher than those found in nature. Power lines, computer monitors, different electrical equipments, radio, television, mobile phones, microwave ovens can be given examples of these EMF sources. Their potential effects of health continue to be the subject of controversy. Extremely low frequency (ELF, 0-3000 Hz) region of spectrum is radiated by transformers, household equipments, high power lines, and by electrical goods is investigated in this study. Since more than 25 years research efforts to find a correlation between the electromagnetic field and their effects on health of human are going on, but without significant success. Generally, countries accept the standarts of International Commission on Non-Ionizing Radiation Protection (ICNIRP), World Health Organization (WHO), and European Committee for Electrotechnical Standardization (CENELEC). Some countries apply more strict limit values than above foundations. The International Agency for Research on Cancer (IARC) reviewed EMFs and cancer in June 2001, and classified magnetic fields as “possibly” carcinogenic for low frequency region. Therefore, exposure limit values of EMFs are really important. The best way is acceptance of As Low As Reasonably Achievable (ALARA) principle as long as have not exact scientific results.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="45">
            <name>Publisher</name>
            <description>An entity responsible for making the resource available</description>
            <elementTextContainer>
              <elementText elementTextId="10032">
                <text>International Burch University</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="10033">
                <text>2013-05-24</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="10034">
                <text>Article
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="43">
            <name>Identifier</name>
            <description>An unambiguous reference to the resource within a given context</description>
            <elementTextContainer>
              <elementText elementTextId="10035">
                <text>ISSN 2233-1565     </text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </item>
  <item itemId="3086" public="1" featured="0">
    <fileContainer>
      <file fileId="3854">
        <src>https://omeka.ibu.edu.ba/files/original/88f415b31cdd1dd53a6ff7a143eaf0f9.pdf</src>
        <authentication>d00733f9a46e03b44d78ea56bf2948a9</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="23791">
                    <text>Comparison of Eleven Mathematical Models for describing the first
Lactation Curve of Holstein Cattle in Turkey
Đsmail Keskin
Department of Animal Science,
Faculty of Agriculture,
Selcuk University, 42075, Konya, Turkey
ikeskin@selcuk.edu.tr
Nazire Memmedova
Department of Animal Science,
Faculty of Agriculture,
Selcuk University, 42075, Konya, Turkey
naziramamedova@yahoo.com
Fatma Đlhan
Department of Animal Science,
Faculty of Agriculture,
Selcuk University, 42075, Konya, Turkey
fatmailhan@selcuk.edu.tr
Birol Dağ
Department of Animal Science,
Faculty of Agriculture,
Selcuk University, 42075, Konya, Turkey
bdag@selcuk.edu.tr
Fariz Mikailsoy
Department of Soil Science and Plant Nutrition,
Faculty of Agriculture,
Selcuk University, 42075, Konya, Turkey
farizm@selcuk.edu.tr

Abstract: In this study, eleven standard lactation curve models (Incomplete Gamma (WD),
Quadratic (Q), Cubic (C), Linear Hyperbolic Function (LH), Inverse Polynomial Function
(IP), Mixed Log (MIL), Exponential (WIL), Dhanoa (DH), Cobby and Le Du (CD),
Polynomial Regression (AS) and New Model (NM)) were used to predict a typical dairy cow
lactation derived as the average daily milk yield of 105 complete first lactations of HolsteinFriesian cows in one herd. Milk yield controls were made daily in this farm. Total milk yield
(TMY) was calculated from observation of daily milk yield. TMY was also predicted by using
11 different models. The total milk yields predicted by the models were very close to each
other and the differences between observation of TMY were not found statistically significant
(P&gt;0.05). The models were found to be adequate for estimation of milk yield.
Determination coefficients (R2) of the models ranged from 67.15 % to 86.68 %. In
comparison of the models, the TMY, peak yield (PY), peak time (PT), persistency (P), mean
square prediction error (MSPE), approximation error ( ε ), reliability criterion for estimating
trustworthiness of the determination coefficient ( σ ), standard error ( θ ) and Durbin-Watson
(DW) values were evaluated together.
Using the AS, WD and new developed NM models accurately predicted the milk
yield of Holstein cows.
Key Words: Holstein, Cows, Lactation Curve, Milk Yield, Mathematical Model

246

�Introduction
Turkey has 11.3 million head of cattle and 70 % of them are improved cattle and their crossbreeds. The
number of milked animals is 4.2 million and approximately 3 million of them are improved cattle and their
crossbreeds. The milk obtained from cows is 11.3 million tons and nearly 86 % of the milk production is
produced by improved cattle and their crossbreds. But the lactation milk yields is very low (i.e. for native breeds
1.3, for crossbreeds 2.7 and for improved cattle 3.9 tons) (TurkStat, 2007).
Producers aim to increase milk yield and decrease cost for a profitable dairy cattle production. Persistency
is one of the most important factors which determine milk production cost along the lactation. Milk yield begins
with calving and reach to highest level between 40 to 70 days and then continues to decrease along the lactation.
With decreasing of daily milk yield, the production cost begins to increase from day to day (Gengler, 1996;
Koçak and Ekiz, 2006). A mathematical model of the lactation curve provides summary information about dairy
cattle production, which is useful in making management and breeding decisions and in simulating a dairy
enterprise (Olori et al., 1999). In order to asses plausible forms of lactation curves, milk yield records collected
throughout the whole lactation are required. But most of the small and medium sized dairy farms in Turkey still
use classical milking systems. Milk yield is generally recorded monthly in these farms. The lactation curve
models enable them to evaluate lactation as a whole. So that the lactation curve shape is determined and
unbiased comparison methods among animals with incomplete lactation records for genetic evaluation purposes
can be practiced (Keown and van Vleck, 1973). Knowledge of the lactation curve allows prediction of total milk
production from partial production measured at several test days early in lactation (Goodal and Sprevak, 1985).
Animals with a high milk yield potential can be identified by using this information before the whole lactation is
completed. Also, lactation curves can be used for prediction of lifetime milk production from early lactation
traits (Dalal et al., 2004), culling, assessing nutritional and health status of animals (Duoduet, 1982; Souvant and
Fehr, 1975) and evaluating a suitable time to end milking (Chang et al., 2001).
The first mathematical model aimed to describe lactation curves was developed by Brody et al., (1923). It
was notified that this model was followed by the models reported out by Sika (1950), Nelder, (1966), Wood
(1967), Dave (1971) and Jenkins and Ferrel (1984) (Landete-Castillejos and Gallego, 2000). The Wood model
has been used in most lactation curve model studies, because it includes the basic features of lactation curves
with only three parameters a, b and c which allow the calculation of average yield, peak yield and peak time,
respectively. This has made the Wood model the most widely used function for the description of lactation
curves. Most of the alternative models are also based on the Wood model (Cobby and Le Du, 1978; Wilmink,
1987; Papajcsik and Bodero, 1988). However, some mathematical models have been proposed to describe the
regular shape of the lactation curve in dairy cows from partial or incomplete data (Neal and Thornley, 1983;
Goodal and Sprevak, 1984; Batra, 1986; Morant and Gnanasakthy, 1989; Dijkstra et al., 1997; Olori et al., 1999;
Vargas et al., 2000). Also these models provide analysing systemic changes in milk yield caused by
environmental factors (Goodall and Sprevak, 1985; Morant and Gnanasakthy, 1989) and determining the milk
production characteristics such as persistency (Gengler, 1996), peak yield and time to peak yield (Masselin et al.,
1987; Gipson and Grossman, 1990).
The objective of this study was to compare the suitability of WD, WIL, MIL, C, Q, DH, IP, CD, LH, AS
and NM models to the first lactation data of Holstein cows.

Materials and Methods
The data of this study was from the first lactation records of 105 Holstein cows raised in a private
enterprise in the Karapınar district (37o 42' K, 33o 35' D and 994 m above sea level) of the Konya Province in the
Central Anatolia Region of the Turkey. The data were collected from the first lactation records of the cows that
gave birth in 2004. They were machine milked twice daily and milking records were started 3th days of lactation.
There is a computer-based herd managing program in the enterprise and milk yield controls were made daily.
Average lactation length was 312±4.37 days. The experiment was carried out according to guidelines of Selçuk
University Faculty of Agriculture located in the Konya Province.
In the study, to explain lactation curves, eleven different empirical mathematical models were used
together and compared. These models are as follows:
(1) Incomplete Gamma (WD), (Wood, 1967):

Y( t ) = at b e − ct
(2) Quadratic (Q), (Dave, 1971):

Y( t ) = a + bt + ct 2
(3) Cubic (C),
247

�Y( t ) = a + bt + ct 2 + dt 3
(4) Exponential (WIL), (Wilmink, 1987):

Y( t ) = a + be − kt + ct (Which was fitted with the parameter k fixed at 0.61)
(5) Mixed log (MIL), (Guo and Swallve, 1995):

Y( t ) = a + bt 1 / 2 + c log t
(6) Polynomial Regression (AS), (Ali and Schaeffer, 1987):

Y( t ) = a + bt + ct 2 + d log t + e log t 2
(7) Cobby and Le Du (CD), (Cobby and Le Du, 1978):

Yt = a − bt − ae − ct
(8) Linear Hyperbolic Function (LH), (Bianchini, 1984):

Y( t ) = a + bt + c (1 / t )
(9) Inverse Polynomial Function (IP), (Nelder, 1966):

Y( t ) = t /(a + bt + ct 2 )
(10) Dhanoa (DH), (Dhanoa, 1981):

Yt = at bc e ( − ct )
(11) New developed model (NM):

Y( t ) = at b e

− ct −

d
t

For all models, Yt is the observed milk yield at day t,
a: is linked to milk yield at the beginning of lactation,
b: to the ascending phase before peak yield,
c: to the decreasing phase after peak yield,
d: parameters which characterize the shape of the curve
e: is the base of natural logarithm,
which were estimated from a nonlinear regression analysis using the Statistica program. The WIL model
has a total of four parameters, with k being exponent; following Wilmink (1987) a fixed value of k was used,
which was estimated at 0.61 in a preliminary analysis as the best fitting value for the herd mean data.
Subsequently the WIL model was considered as a three parameter curve in analysis of individual animals.
Persistency (P) was calculated as:
k

P (%) =

∑ (pi + 1) / pi
i =1

k

× 100

Where pi is the yield of the record i that starts at peak time and k is the record number from peak time to
the end of lactation (Sturtevant, 1986).
The parameters obtained were used to calculate the predicted yields in the original equations at above.
Residuals, defined as the absolute values of the difference between the predicted yield and real data of daily milk
yield, were calculated and then the mean square prediction error (MSPE) for each lactation curve fitted was
calculated and averaged for each model (Ruiz et al., 2000).
True peak yield (PY) was determined from the test day milk yield means for the 105 cows and true peak
time (PT) was determined as the average day on which daily milk yields were at their maximum value. Peak
time values of the models were calculated by equalizing the first partial derivations of the functions to zero, and
PY values were found for each cow by replacing PT values in the functions.
The Durbin-Watson statistics was used as a measure of first order positive autocorrelation to test whether
the residuals were randomly distributed (Grossman and Koops, 1988). DW was calculated for each lactation and
models.
Approximation error was calculated as:

ε=

100 n Yi − Y
∑
n i =1 Yi

Reliability criterion for estimating trustworthiness of the determination coefficient was calculated as:

248

�θ=

R2
n
1 − (R 2 ) 2

Standard error was calculated as:
n

∑ (Y

i

σ=

− Y)

i =1

n−m

The models were compared in respect of their MSPE, correlation between yields and residuals (RESC),
2
R , TMY, peak yield (PY), peak time (PT), persistency (P), ε , θ and σ .

Results and Discussion
Lactation curve parameters in Holstein cattle were given in Table 1. The parameter a, which expressed the
milk yield at beginning was 0.10 in IP model, and ranged from 12.71 to 25.81 among the other models.
Estimates of parameter a, were found in this study more higher than estimated for WD, Q, C, LHF and IP
models in Brown Swiss cows by Keskin and Tozluca (2004), for Q, C and WIL models in Simmental cows by
Çilek and Keskin (2008), for WD, IP and AS models in Holstein-Friesian cows by Olori et al., (1999); but less
than estimated for WD, MIL and AS models in Simmental cows by Çilek and Keskin (2008), for WD, MIL, WIL
and AS models in Brown Swiss cows by Keskin et al., (2009), for MIL and WIL models by Olori et al., (1999)
and very close to value estimated for WIL model in Holstein cows by Dědková and Němcová (2003).
Models*

Model parameters

a ± Sa

b ± Sb

WD
Q

15.34±0.428
22.04±0.419

0.161±0.0081
0.024±0.0038

C

20.53±0.649

AS

12.71±0.474

WIL

24.96±0.477

MIL

13.25±0.483

DH

20.38±0.759

LHF

25.81±0.438

0.062±0.0124
0.640±0.1099
26.064±1.1519
1.886±0.0831
49.468±11.7626
0.034±0.0015
0.032±0.0015

c ± Sc
0.0030±0.00012
0.0002±0.00001
-0.0004±0.00008
2.1894±0.34652

d ± Sd

0.00000054±0.000
0001
6.330±0.5275

e ± Se

1.022±0.1533

0.0307±0.00140
6.2888±0.25894
0.0020±0.00017

17.1840±0.74544
CD
25.44±0.439
18.3727±3.3172
0
IP
0.10±0.012
0.034±0.0009
0.0001±0.00000
5
NM
17.11±0.704
0.145±0.0118
0.0029±0.00013
0.153±0.0587
*
: WD: Incomplete Gamma, Q: Quadratic, C: Cubic, LH: Linear Hyperbolic Function, IP: Inverse Polynomial
Function, MIL: Mixed Log, WIL: Exponential, DH: Dhanoa, CD: Cobby and Le Du, AS: Polynomial
Regression and NM: New Model
Table 1. Estimates of the model parameters and their standard errors of eleven models
The highest estimate of parameter b was fixed in WD model, but the lowest estimate was fixed in DH
model. Estimates of parameter c were ranged from -17.1840 to 18.3727.
Estimated lactation curve parameters for this herd were generally different from the previous studies. It
may be due to raising in different environmental conditions or a result of management and administration in
different ways in terms of milk production. On the other hand it is well known that the Holstein breed is more
productive in temperate climatic zones and its milk production capacity may change by geographical regions.
249

�The parameter a expressing milk yield at the beginning and the b parameter indicating the speed of curve
increase must be higher and the c parameter meaning the speed of curve decrease must be lower in order to
obtain more milk production.
The lactation curves of Holstein cows are given in Figures 1. As seen in this figure, fit lines of WD, AS
and NM models are very close to the observed values. The total milk yield predicted by different models is very
close to observed total milk yield, and the differences between them, were not significant (P&gt;0.05).

250

�MSPE
5.95±0.321
6.11±0.300
5.75±0.712
4.90±0.256
7.04±0.370
5.99±0.329
7.90±0.542
6.57±0.356
6.88±0.370
7.57±0.418
5.67±0.315

RESC
-0.13
0.30
0.31
0.00
0.17
-0.17
0.61
0.04
0.13
-0.35
-0.17

Ɛ
14.00
13.71
14.12
12.74
15.33
14.29
15.90
14.95
15.13
15.78
13.93

σ
2.47
2.43
2.55
2.18
2.59
2.39
2.69
2.51
2.57
2.70
2.33

Ɵ
45.52
39.61
41.70
54.36
34.74
44.71
36.90
39.01
36.77
31.69
47.30

DW
0.849
0.776
0.804
0.949
0.716
0.833
0.737
0.775
0.732
0.673
0.877

Table 2. Comparison of the models for estimating total milk yield (TMY), peak yield (PY), time to peak yield (PT), Persistency (P), Correlation between yields and residuals
(RESC) and goodness-of-fit statistics (R2 and MSPE values)

Goodness of fit statistics
Models
TMY (l)
PY (l)
PT (day)
P (%)
R2 (%)
ns
bc
a
ns
WD
6407±150
24.50±0.368
72.84±17.20
99.7±0.01
76.17±0.014
Q
6370±148 ns
22.04±0.414d
0.00±0.000d
99.5±0.01 ns
75.23±0.013
20.53±0.649d
0.00±0.000d
99.6±0.03 ns
77.87±0.011
C
6338±145 ns
23.96±0.419c
56.14±3.267a
99.8±0.03 ns
80.65±0.011
AS
6370±148 ns
24.90±0.448bc
10.16±0.15c
99.9±0.05 ns
86.68±0.171
WIL
6370±148 ns
25.29±1.078bc
62.79±10.82a
99.8±0.04 ns
75.25±0.015
MIL
6370±148 ns
24.46±0.358bc
-46.17±11.66d
99.6±0.01 ns
63.59±0.025
DH
6372±148 ns
24.71±0.366bc
23.69±0.771c
99.9±0.38 ns
72.22±0.016
LHF
6370±148 ns
25.08±0.395bc
13.40±1.288c
99.8±0.03 ns
69.93±0.019
CD
6364±148 ns
IP
6335±147 ns
22.88±0.424b
29.31±1.287bc
99.9±0.02 ns
67.01±0.019
24.65±0.366bc
45.67±2.89ab
99.6±0.03 ns
76.88±0.014
NM
6309±159 ns
28.72±0.422a
69.38±4.85a
99.9±0.03 ns
Really
6369±149 ns
a, b, c, d
: The means within columns with different superscript are significantly different at P&lt;0.01
ns
: not significant.

�30

25

20
Milk Yield (l)

15

10

5

0
1

16 31

46 61

76

91 106 121 136 151 166 181 196 211 226 241 256 271 286 301
Day of Lactation

Data

AS

WD

WL

IP

LH

MIL

NM

DH

Q

C

CD

Figure 1. Shape of lactation curve according to the models
Total milk yield (TMY), peak yield (PY), time to peak yield (PT), Persistency (P), Correlation between
yields and residuals (RESC) and goodness-of-fit statistics (R2 and MSPE values) of the models are given in
Table 2. In this study, the differences between estimated and observed peak yields were found significant
(P&lt;0.01). But the differences between Q and C models’ peak yields were not significant and very close
predictions obtained with WD, WIL, MIL, DH, LHF, CD, IP and NM models. However the really peak yield
was found as 28.72, all models underestimated the peak yield. Generally the curves of real lactation data based
on daily milk yields are very fluctuant. But the estimated yields of the models are not fluctuant and also the
estimated peak yields do not show sharply increases as seen in real data. On the other hand it is expected that the
real lactation data of well managed herds do not fluctuate.
The peak time obtained from actual milk yield and the predicted PT from WD, AS and MIL models were
very close and the differences between them, were not significant (P&gt;0.05). But, the peak time of DH model was
negative and the peak times of Q and C models were estimated to be only zero. It is likely caused by the
decreasing curves from the beginning to the end of lactation for DH, Q and C models estimated by present data
as seen in Figure 1.
Persistency (P) values among all models were very close to each other and the differences between them,
were not significant (P&gt;0.05).
The P values, were found in this study were higher than values predicted for WD, Q, C, LHF and IP
models in Brown Swiss cows by Keskin and Tozluca (2004), for Q, C, WD, MIL, AS and WIL models in
Simmental cows by Çilek and Keskin (2008).
Higher determination coefficients for the used models show good fitting level of independent variables
for explaining dependent variables. For all models R2 value were estimated between 63.59 % and 86.68 %, it
was obtained the lowest value in DH (63.59) model, the highest in WIL (86.68) model. The best fitness was
obtained with WIL, it was followed by AS, but DH fitted worst. The lowest MSPE values were fixed in AS
model, then in WD model. The highest MSPE value was fixed in DH model. The R2 in this study, were lower
than R2 values, were notified for WD, Q, C, LHF and IP models in Brown Swiss cows by Keskin and Tozluca
(2004), for Q, C, WD, MIL, AS and WIL models in Simmental cows by Çilek and Keskin (2008), for WD, IP,
WIL, MIL and AS models in Holstein-Friesian cows by Olori et al., (1999).
MSPE values were higher than values, notified for Q, C, WD, MIL, AS and WIL models in Simmental
cows by Çilek and Keskin (2008).

252

�The lowest ε values were realized in AS model and it was followed by Q, NM and WD models. The
same condition was found for σ values, too. The highest θ values were estimated in AS model, it was followed
by NM and WD models. Autocorrelation values for all models were close to zero, indicating positive
autocorrelation which may pose problems with statistical inferences about the models.
Correlations between the residuals and observed milk yield (RESC) ranged between -0.35 (IP) to 0.61
(DH) for all models. Though estimated residuals generally increased with observed yields, there are a little
except for C, Q and DH. High daily yields being most difficult to predict while very low yields also caused
problems (Olori et al., 1999).

Conclusion
The TMY, PY, PT and P values in AS, WD and NM models were found very close to actual values, but
MSPE, RESC, ε and σ values were the lowest than actual. The highest R2 and θ values were found in these
models.
As the result of assessing of TMY, PY, PT, P, R2, MSPE, RESC, ε , σ , θ and DW statistics together, it
can be said that the using AS, WD and new developed NM models make possible of predicting milk yields,
close to actual values in Holstein cows at first lactation.

Acknowledgments
This research was funded in part by a grant from the University of Selcuk (BAP), The authors wish to thank the staff
of KAR-YEM AŞ, Konya, TURKEY.

References
Ali, T.E. &amp; Schaeffer, L.R. (1987). Accounting for covariance among across weeks of peak production test day milk yields in
dairy cows. Can. J. Anim. Sci., 67: 637-644.
Batra, T.R. (1986). Comparison of two mathematical models in fitting lactation curves for pureline and crossline dairy cows.
Can. J. Anim. Sci., 66: 405-414.
Bianchini, E.S. (1984). Estudo da curva de lactaçao de vacas da raça Gir. Doctoral Thesis. Faculta de Medicina de Ribeirao
Preto, USP.
Brody, S.A., Ragsdale, A.C. &amp; Turner, C.W. (1923). The rate of decline of milk secretion with the advance of the period of
lactation. J. Gen. Physiol., 5: 441-444.
Chang, Y.M, Rekaya, R., Gionala, D. &amp; Thomas, D.L. (2001). Genetic variation of lactation curves in dairy sheep: a
Bayesian analysis of Wood’s function. Livest. Prod. Sci., 71: 241-251.
Cobby, J.M. &amp; Le Du, Y.L.P. (1978). On fitting curves to lactation data. Anim. Prod., 26: 127-133.
Çilek, S. &amp; Keskin, I. (2008). Comparison of six different mathematical models to the lactation curve of Simmental cows
reared in Kazova State Farm. J. Anim. Vet. Adv., 7 (10): 1316-1319.
Dalal, D.S., Malik, Z.S., Chhikara, B.S. &amp; Ramesh, C. (2004). Prediction of lifetime milk production from early lactation
traits in Hariana cattle. Indian J. Anim. Sci., 74:11.
Dave, B.K. (1971). First lactation curve of Indian water buffalo. JNKVV Res. J. 5:93.
Dědková, L. &amp; Němcová, E. (2003). Factors affecting the shape of lactation curves of Holstein cows in the Czech Republic.
Czech J. Anim. Sci., 48 (10): 395–402.
Dhanoa, M.S. (1981). A note on an alternative form of the lactation model of Wood. Anim. Prod., 32: 349.
Dijkstra, J., France, J., Dhonoa, M.S., Maas, J.A., Hanigan, M.D., Rook, A.J. &amp; Beever, D.E. (1997). A model to describe
growth patterns of the mammary gland during pregnancy and lactation. J. Dairy Sci., 60: 2340-2354.
Dudouet, E. (1982). Courbe de latation thèorique de la chèvre et applications (theoretical lactation curve of the goat and its
applications). Le Point Veterinare, 14: 53-61.

253

�Gengler, N. (1996). Persistency of lactation yields: a review. Proceedings of the Interbull Annual Meeting. Gemblouxs,
Belgium, January 21-23. Bulletin No. 12. Department of Animal Breeding and Genetics, SLU, Uppsala, Sweden, 8796 pp.
Gibson, T.A. &amp; Grossman, M. (1990). Lactation curves in dairy goats: a review. Small Rumin. Res., 3: 383-396.
Goodal, E.A. &amp; Sprevak, D., 1985. A Bayesian estimation of the lactation curve of a dairy cow. Anim. Prod., 40: 189-193.
Grossman, M. &amp; Koops, W.J. (1988). Multiphasic analysis of lactation curves in dairy cattle. J. Dairy Sci., 71: 1598-1608.
Grossman, M., Kuck, A.L. &amp; Nortan, H.W. (1986). Lactation curves of purebred and crossbred dairy cattle. J. Dairy Sci., 69:
195-203.
Guo, Z. &amp; Swalve, H. H. (1995). Modeling of the lactation curve as a sub-model in the evaluation of test day records.
Proceedings of the Interbull Annual Meeting. Prague, Czech Republic, Sep. 7-8 1. Bulletin No. 11. Department of
Animal Breeding and Genetics, SLU, Uppsala, Sweden, 4 pp.
Jenkins, T.G. &amp; Ferrell, C.L. (1984). A note on lactation curves of crossbred cows. Anim. Prod., 39: 479-482.
Keown, J.F. &amp; Van Vleck L.D. (1973). Extending lactation records in progress to 305 day equivalent. J. Dairy Sci. 56: 10701079.
Keskin, I. &amp; Tozluca, A. (2004). Describing of different mathematical models for lactation curve and estimation of control
interval in dairy cattle. Selcuk Univ. The J. of Agric. Fac., 18 (34): 11-19.
Keskin, I., Dag, B. &amp; Sariyel, V. (2009). Fitness of four different mathematical models to the lactation curve of Brown Swiss
cows in Konya Province of Turkey. Can. J. Anim. Sci., 89: 195-199.
Koçak, Ö. &amp; Ekiz, B. (2006). Studies on factors affecting the milk yield and lactation curve of Holstein cows in intensive
conditions. The Journal of The Faculty of Veterinary Medicine Istanbul University, 32 (2): 1-13.
Landete-Castillejos, T. &amp; Gallego, L. (2000). Technical Note: The ability of mathematical models to describe the shape of
lactation curves. J. Anim. Sci., 78: 3010-3013.
Masselin, S., Sauvant, D., Chapoutot, P. &amp; Milan, D. (1987). Adjustment models for lactation curves. Ann. Zootechnie, 36:
171-206.
Morant, S.V. &amp; Gnanasakthy, A. (1989). A new approach to the mathematical formulation of lactation curves. Anim. Prod.,
49: 151-162.
Neal, H.D. &amp; Thornley, J.H.M. (1983). The lactation curve in cattle: a mathematical model of the mammary gland. J. Agric.
Sci., (Camb.) 101: 389-400.
Nelder, J.A. (1966). Invense polynomials, a useful group of multi-factor response functions. Biometrics, 22: 128.
TurkStat (2007). Livestock statistics. Turkish Statistical Institute. Ankara, Turkey.
Olori, V.E, Brotherstone, S., Hill, W.G. &amp; McGuirk, B.J. (1999). Fit of standard models of the lactation curve to weekly
records of milk production of cows in a single herd. Livest. Prod. Sci., 58: 55-63.
Papajcsik, I.A. &amp; Bodero, J. (1988). Modelling lactation curves of Friesian cows in a subtropical climate. Anim. Prod., 47:
201-207.
Ruiz, R., Oregui, L.M. &amp; Herrero, M. (2000). Comparison of models for describing the lactation curve of Latxa sheep and an
analysis of factors affecting milk yield. J. Dairy Sci., 83: 2709-2719.
Sauvant, D. &amp; Fehr, P. (1975). Classification des courbes de lactation et d’evolution de la composition du lait de la chèvre
(classification od lactation curves and of the evolution of composition of goat milk). J. Recherche Ovine et Caprine
(Paris), 2-4 Dec. 90-107 pp.
Sikka, L.C. (1950). A study of lactations as affected by heredity and environment. J. Dairy Res., 17: 231-252.
Statistica for Windows PC 5.0, (1995). Stat Soft, Inc. 2325 East 13th Street, U. S. A.
Sturtevant, E. L. (1986). Influence of Distance From Calving on Milk Yield. Agri. Exper. Station Pep., 22-23, Geneva.
Vargas, B., Koops, W.J., Herroro, M. &amp; Van Arendonk, J.A.M. (2000). Modelling extended lactation of dairy cows. J. Dairy
Sci., 83: 1371-1380.

254

�Wilmink, J.B.M. (1987). Adjustment of test day milk, fat and protein yield for age, season and stage of lactation. Livest. Prod.
Sci., 16: 335-348.
Wood, P.D.P. (1967). Algebraic model of lactation curve in cattle. Nature, Lond. 216: 164-165.

255

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23785">
                <text>435</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23786">
                <text>Comparison of Eleven Mathematical Models for describing the first  Lactation Curve of Holstein Cattle in Turkey</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="23787">
                <text>Keskin, İsmail
Memmedova, Nazire
İlhan, Fatma
Dağ, Birol
Mikailsoy, Fariz</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23788">
                <text>In this study, eleven standard lactation curve models (Incomplete Gamma (WD),  Quadratic (Q), Cubic (C), Linear Hyperbolic Function (LH), Inverse Polynomial Function  (IP), Mixed Log (MIL), Exponential (WIL), Dhanoa (DH), Cobby and Le Du (CD),  Polynomial Regression (AS) and New Model (NM)) were used to predict a typical dairy cow  lactation derived as the average daily milk yield of 105 complete first lactations of Holstein-  Friesian cows in one herd. Milk yield controls were made daily in this farm. Total milk yield  (TMY) was calculated from observation of daily milk yield. TMY was also predicted by using  11 different models. The total milk yields predicted by the models were very close to each  other and the differences between observation of TMY were not found statistically significant  (P&gt;0.05). The models were found to be adequate for estimation of milk yield.  Determination coefficients (R2) of the models ranged from 67.15 % to 86.68 %. In  comparison of the models, the TMY, peak yield (PY), peak time (PT), persistency (P), mean  square prediction error (MSPE), approximation error ( ε ), reliability criterion for estimating  trustworthiness of the determination coefficient (σ ), standard error ( θ ) and Durbin-Watson  (DW) values were evaluated together.  Using the AS, WD and new developed NM models accurately predicted the milk  yield of Holstein cows.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23789">
                <text>2010-06</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="23790">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="15">
        <name>Q Science (General)</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="3003" public="1" featured="0">
    <fileContainer>
      <file fileId="3771">
        <src>https://omeka.ibu.edu.ba/files/original/45f095bd0421f6b027d9bd2d4d6effc8.pdf</src>
        <authentication>b5b7a252e2972987e8be2e1fe7d2472c</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="23210">
                    <text>2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

Comparison of Islamic, Traditional and Alternative Utility Theories
Sümeyye DEMĠRSOY
International University of Sarajevo
Bosnia and Herzegovina
Mehmet CAN
International University of Sarajevo
Bosnia and Herzegovina
mcan@ius.edu.ba

Abstract: Decision making under uncertainty is always trying to be explained by utility
theory. And utility theory‘s roots rely on moral philosophy. Moral philosophy is
concerning concepts about good and bad, right and wrong, virtue, justice, etc. It can be say
that utilitarianism, which is a field of moral philosophy, is more directly about utility
theory. Throughout the human history, from Prophet Abraham to Greek philosophers;
Socrates, Aristotle, Epicurus, to Islam scholars al Kindi, al-Farabi, al-Razi, Ibn-i Sina, Ibni Rushd, Ibn-i Haldun, all discussed about ethics and utility concept.

1. Introduction
Although utility is an economic term which measures the satisfaction or desirability in terms of the
consumption of goods and services, its roots rely on moral philosophy which deals with the concepts about good and
bad, right and wrong, virtue, justice, and happiness. On the other hand, concepts of good and bad, right and wrong,
virtue, justice, and happiness was constituents of ethichs throughout history of humanity.
Human concepts like good and bad, right and wrong, virtue, justice, and happiness were the concern of
human civilizations through millenniums. Historically the foundations of human ethics are laid by divine revaliations
through prophets.
Muslims identify the prophets of Islam as those humans chosen by Allah to teach mankind. Humans may
rely on revelation or tradition to identify prophets. Each prophet brought the same basic ideas of ethics. They brought
the belief in a single God and in the avoidance of idolatry and sin.
Muslims regard Adam as the first prophet and Muhammad as the last. Islamic theology recognises as many
as 124,000 prophets. The Qur'an identifies 25 prophets by name, starting with Adam and ending with Muhammad.
Five of them, Rasuls, receive the highest reverence for their perseverance, Ibrahim (Abraham), Moosa (Moses),
Dawud (David), Isa (Jesus), and Muhammad.

1.1 Prophet Ibrahim (Abraham)
Ibrahim was born in a house of idolaters, in the kingdom of Babylon. He announced to his people: O my
people I turned my face towards Him Who created the heavens and the earth, and never shall I give partners to Allah.
He has the power to make the stars rise and set.

1.2 Prophet Musa (Moses)
Musa the son of Imran, was born in Egypt in which at that time the kings were known as Fir‘awns. First
statement about working ethics in Torah is in Genesis part. ―In the sweat of thy face shalt thou eat bread, till thou
return unto the ground; for out of it wast thou taken: for dust thou art, and unto dust shalt thou return.‖ (Torah,
Genesis 3/19)
In Torah it is suggested to respect neighbour rights and to behave in good way to them: ―When thou dost
lend thy brother any thing, thou shalt not go into his house to fetch his pledge. Thou shalt stand abroad, and the man
to whom thou dost lend shall bring out the pledge abroad unto thee. And if the man be poor, thou shalt not sleep with
his pledge: In any case thou shalt deliver him the pledge again when the sun goeth down, that he may sleep in his

476

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

own raiment, and bless thee: and it shall be righteousness unto thee before the Lord thy God."(Torah, Deureronomy
24/10-13)
Another statement about poor is as following and in this statement Torah mentioning about living together
with brothers: ―And if thy brother be waxen poor, and fallen in decay with thee; then thou shalt relieve him: yea,
though he be a stranger, or a sojourner; that he may live with thee. Take thou no usury of him, or increase: but fear
thy God; that thy brother may live with thee.‖ (Torah, Levitucus 25/35-36)
―If thou lend money to any of my people that is poor by thee, thou shalt not be to him as an usurer, neither
shalt thou lay upon him usury. If thou at all take thy neighbor's raiment to pledge, thou shalt deliver it unto him by
that the sun goeth down: For that is his covering only, it is his raiment for his skin: wherein shall he sleep? and it
shall come to pass, when he crieth unto me, that I will hear; for I am gracious.‖ (Torah, Exodus, 22/25-27)
―And when ye reap the harvest of your land, thou shalt not wholly reap the corners of thy field, neither shalt
thou gather the gleanings of thy harvest. And thou shalt not glean thy vineyard, neither shalt thou gather every grape
of thy vineyard; thou shalt leave them for the poor and stranger: I am the LORD your God.‖ (Torah, Levitucus 19/910)
―And if thy brother be waxen poor, and fallen in decay with thee; then thou shalt relieve him: yea, though
he be a stranger, or a sojourner; that he may live with thee.‖ (Torah, Levitucus 25/35)

1.3 Prophet Dawud (David)
Dawud (David) was not only an illustrious Prophet of the Israelites but he was also their king. The Holy
Qur'an affirms: "And Allah gave him the kingdom! and wisdom and taught him of that which He willed." (2: 251).
And it was said unto him: ― 0' Dawud!To! We have set you as a vicegerent in the earth, therefore, judge aright
between mankind and follow not desire." (38:26)
He lived in Bait-ul-Lahm which was situated at the distance of ten miles from Jerusalem. He prayed: "Our Lord!
Pour out constancy on us and make our steps firm and help us against those who are disbelievers." (2:249)
Allah revealed the Zabur (Book of Psalms) to Prophet Dawud. It contains lessons for the guidance of his people.

1.4 Buddha
The evidence of the early texts suggests that the Buddha was born in a community that was on the
periphery, both geographically and culturally, of fifth century BCE northeast India. This community seems to have
had two categories of people, masters and servants.
The Four Noble Truths of Budhism:
1.
2.

3.
4.

Life as we know it ultimately is or leads to suffering/uneasiness (dukkha) in one way or another.
Suffering is caused by craving. This is often expressed as a deluded clinging to a certain sense of existence,
to selfhood, or to the things or phenomena that we consider the cause of happiness or unhappiness. Craving
also has its negative aspect, i.e. one craves that a certain state of affairs not exist.
Suffering ends when craving ends. This is achieved by eliminating delusion, thereby reaching a liberated
state of Enlightenment (bodhi);
Reaching this liberated state is achieved by following the path laid out by the Buddha.

2. The Greek Philosophers
For ancient Greek philosophers the question ‗how should I live?‘ took a fundamentally prudential or selfregarding form. It amounted for them an inquiry searching how a man could secure his own happiness, fulfilment or
perfection. Benevolence, altruism, philanthropy, a concern for happiness of others occupied a secondary position in
their ethical recommendations. It is conceived as a condition of self-realisation of the individual. In general Greek
philosophers, Plato and Aristotle in particular, found a place for restricted benevolence by emphasising the role of
friendship in a fully satisfying life. Aristo somewhat made a disdainful liberality part of his conception of the
ethically ideal or ‗magnanimous‘ man.
It can be said that utilitarianism, which is a field of moral philosophy, is more directly about utility theory.
Utilitarianism can be understood as a movement for legal, political and social reform that flourished in the nineteenth
century. It can also be understood as the ideology of that movement. It is also a general ethical theory. As a theory of

477

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

ethics, it provides a criterion for distinguishing between the right and wrong action, an account of nature of the moral
judgements that characterise action as right and wrong.
Utilitarianism can be expressed as the combination of two principles: (i) the consequentialist principle that
the rightness, or wrongness of an action is determined by the goodness, or badness of the results that flow from it and
(ii) the hedonist principle that the only thing that is good in itself is pleasure and the only thing bad in itself is pain.
The doctrine can be expressed in the form of a single principle, the greatest happiness principle: the rigthness of an
action is determined by its contribution to the happiness of everyone affected by it. (Quinton, 1973).
Greek moralists unreflectively assumed a consequentialist position in developing their more or less
prudential life-styles. The only way in which they conceived it to be possible to justify a type of conduct was by
reference to the results to which it gave rise (Quinton, 1973).

2.1

Socrates (BC 470-399)

Socrates' ethical intellectualism has an eudaemological character in which he asserted that the highest good
for any human being is happiness. According to Socrates, people‘s true happiness is promoted by doing what is right.
When people‘s true utility is served (tending own soul), people are achieving happiness. Happiness is evident from
the long-term effect on the soul.
According to him, whatever action a man chooses is motivated by his desire for happiness. All the concepts
like knowledge, virtue, and wisdom are the same and man chooses an action according to what he thinks will bring
him the greatest happiness. Therefore the more a man knows, the greater his ability to reason out the correct choice
and the greater his ability to choose those actions which truly bring happiness to him.
According to Socrates, to answer the question ‗what is happiness‘, the first question should be asked to an
individual by himself: If all his needs were completed, or he had enough power to do anything, would he really be
happy? When individual observe and attract attention to him, even all these needs mentioned above are completed,
he sees that he is not happy enough. On the contrary he sees that he witnessed many disappointments, only when he
is in harmony with himself he will really be happy. People who are not in harmony with themselves never be happy
properly.
Socrates believes that psychic harmony is the greatest good, and that the result of it is moral behavior. He
also believes that if you have a healthy body and soul then you are in psychic harmony with yourself. In the ideal
soul, the reasoning part and the feelings (desires for honour) rule over the appetitive part (desires for wealth, food,
etc). A properly ordered soul experiences a sense of well-being or psychological health. Thus, psychological health is
something distinct from psychological stability since it depends on psychic harmony.
According to Socrates, immoral behavior is a result of an unbalanced personality and leads people to
irrational behavior. Psychic harmony is a psychological condition and makes someone moral and this harmony has
no motivation. Moral behavior comes from people‘s own beliefs and desires. If one is bad or unjust in the social
sense it is because of his sensuality, greed, or vanity. According to Socrates where there is psychic harmony, the
motives for injustice in the social sense will be eliminated.
Socrates did not surpass the prejudice of Greek intellectualism in ethics. It is enough to know virtue in order
to be virtuous. Everyone wishes to be happy and if someone does not attain happiness, it is because he does not know
the way that leads to happiness. Consequently, so-called evil men are in reality only ignorant. Thus, vice is
synonymous with ignorance, so knowledge of the good is synonymous with virtue. That is why Socrates, who
intended to form a virtuous youth, restricted his teaching to the search for moral concepts.
The foundations of Socratic system of ethics can be summarised as;
i.
a choice is rational if and only if it is a choice of what is best for the agent
ii.
something is good for an agent if and only if it is morally right.
The cornerstone of Socratic ethics is the self-interested concern for happiness, that is, one's own good. But
this concern requires that we act in accordance with what is just and noble, that is the moral good. The identity of
one's own good with the moral good is the basis for specific Socratic foundations. Socrates shows in the book
Gorgias that one's own good coincides almost completely with the moral good, since the utility of nonmoral goods,
for example, wealth, depends on the possession of moral excellence.
According to Socrates, the content of goodness/benevolence covers utility and pleasure. That is why some
of the historians of ethics assume that Socrates is Epicurean, so he is utilitarian. However, pleasure, which Socrates
implies, is not harmfull to intellect and soul. Here intellect has a role of being informative and determinant. Thus
Socrates‘ pleasure is under control of intellect. And also Socrates‘ utility view is not individualistic, but public.

478

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

2.2

School of Cyrene – Aristippus (BC 435-366)

Cyrene school of philosophy is a kind of discipline of Socrates, founded in the 5th century by Aristippus.
He was a hedonist and urging the pre-eminent claims of bodily pleasure as an end.
This philosopher explains his thought, which he starts with a question ‗what is a happy life‘, by answering
‗life, which has as much as possible pleasure and as little as possible suffering, is a happy life‘. To make life happy,
someone should include as much as possible pleasure and as little as possible suffering to his/her life. For to do this,
he/she should decrease needs, so it will be adjusted to life with limited needs. So by doing with limited needs, it can
be benefited from all pleasures of life.
Aristippus teaches us kind of a life art that should be lived wisely. And this art can be gained by giving up
desires which make people slaves of themselves. Thus, school of Cyrene derived hedonism from Socrates‘
eudaimonism. Later on, school of Cyrene‘s this hedonism concept was accepted by Epicurus (Aster, 2005)

2.3

School of Cynicism – Antisthenes (BC 455-365)

Antisthenes, who outlined the themes of the school, is the first philosopher of school of cynicism. He had
been a pupil of Socrates in the late 5th century BC. Antisthenes has severe discussions to Aristippus‘ pleasure
assumption.
According to Antisthenes, people should look for real happiness in desire of freedom and liberty inside.
People, who reach to real happiness, know how to stay insensitive and disregarding in relation to either pleasure or
suffering. This staying indifferent in relation to pleasure or suffering brings freedom from inside.

2.4

Plato (BC 427-347)

Plato, founder of Academy in Athens, is a classical Greek philosopher. His mentor was Socrates and his
student was Aristotle. According to Plato, nothing is fine without moderation, and the thing which is pleasant
becomes as pleasant with this moderation. He also mentiones about moderation in between organic pleasures and
intellectual benevolence.
Unlike Aristotle‘s developed concept of happiness (will be mentioned below), Plato‘s happiness concept is
more obscure. According to Plato, the Good is the source of intelligibility. He asserted that the highest goal in all of
education is knowledge of the Good. According to him, human beings aim at the good, nobody voluntarily chooses
evil.
Plato thinks that the masses are incapable of grasping the truth. He illustrated this in his the allegory of the
cave where Plato suggests that the masses cannot see the truth directly but they are satisfied with an illusion of
reality. According to Plato, the good is the source of intelligibility and happiness is the attainment of intelligibility.
Thus, Plato describes his happiness concepts as the goal of life.
There is an important difference between Socratics and Plato. Both Aristippus and Antisthenes are
individualists. For both of them, origin is individual. Both of them don‘t deal with super-individual fact as state,
history, society. If people want to be really happy, they should be on their own, they should not depend on other
people. In this point, Plato has disagreement with his other two schoolmates. According to Plato, people never be on
their own but always live with other people. If we isolate an individual from the society he lives in, we exclude him
from his own resources. Moreover, the institution called ‗state‘ is like a human. What condition social entire is in,
individual lives in the same condition. Thus to understand human being, it should be looked into the state which he
lives. Therefore Plato rejects Socratics‘ (Aristippus and Antisthenes) thoughts. Although they seem like the followers
of Socrates, Plato thinks that they are not thinking in similar perspective. According to Plato, contrary to Aristippus
and Antisthenes‘ individualistic tendency, Socrates did not only show his skillfulness about living, but he also
became the first example of moral principle which took socialization as a base (Aster, 2005).

2.5

Aristotle (BC 384-322)

Aristotle is a Greek philosopher who is student of Plato and teacher of Alexander the Great. His teaching
about virtue and ethics is set forth in his Nicomachean Ethics.
Greek philosophy accepts that human life‘s main purpose is ‗happiness‘. However they differentiate from each other
about ‗what is happiness‘. Aristotelian method is different from others because, according to him, each existence has

479

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

its own activity. Knowing specific activity of human shows us that what kind of an objective this human wants to
reach. So it will be learned what is the ‗real happiness‘.
According to Aristotle, happiness (eudaimonia) is an activity of the soul in accordance with virtue and it is
the highest of all goods. Happiness is the first principle and cause of all goods and it is a self-sufficient activity
always chosen for itself. Unlike other virtues, happiness is the only thing chosen only for itself and not for the sake
of other things. Aristotle believes that amusement is not self sufficient like happiness and he distinguished between
the life of amusement and the happy life.
Aristotle proposes two possible paths to happiness as a life of virtuous activity and a life of theoria and asks
which of the two is the best. The former path defines happiness as a practical vitue and in need of external goods.
And the latter path theoria is contemplation of eternal truths for an entire lifetime which is the highest activity of
reason. Aristotle chooses theoria. Becuase the life of practical virtue achieves happiness in a lesser sense because of
the necessity of material goods in this life and the life of the theoria limits the need for material goods and perfect
happiness (eudaimonia) is enjoyed by the Gods.
According to Aristotle, human being is rational as his nature. If a rational man behaves rationally and
moderate, he will be behaved parallelly to his nature. ‗Thinking‘ and ‗knowing‘ are human‘s highest activities. But
what can be ratio/intellect‘s activity in practical life? Each of the virtues is a state of being that naturally seeks its
mean. All virtues are in between of two extremes, virtues are means of the two extremes. This mean is not
mathematical but it is a border which can be found by ratio/intellect. The virtuous habit of action is always an
intermediate state between the opposed vices of excess and deficiency. For example; with respect to the enjoyment
of pleasures, temperance (sophrosúnê) is a mean between the excess of intemperance and the deficiency of
insensibility. Greek thought always searches for harmony and moderation, dislikes extremes. Aristotle is an opposer
to all kind of extremes, too. Thus, Aristotle is against the thoughts that reject possessions like Cynics. Aristotle
emphasizes on possessions‘ relative values. By using them moderately, they can bring even sense and value. One
another thought of Aristotle is not to ignore desires and urges totally, they should not be ignored totally, but to keep
them down by ratio.
According to Aristotle, there is no direct connection between good/benevolence and pleasure. These three
points should be taken into account: First pleasure is not main principle for a moral life, second pleasure occurs as a
result of an action which targets virtue. And third virtue is in the action which results in pleasure.

2.6

Epicurus (BC 341-271)

Two more schools joined to the list of schools of philosophy in about BC 300s. One is school of Stoa and
the other one is Epicureanism that took its name from the founder Epicurus. These two schools have contradictory
opinions about life and knowledge.
As it is mentioned in former parts of this work, school of Cynicism and Cyrene have contradictory opinions
about ethics, too. According to Cynics, it is important to exert perfect dominance on desires. After Cynics, we see
some similar thoughts are repeated by Stoa. It is virtuous to exert dominance on excitements, desires because virtue
gives us staying apathetic to life and death. School of Cyrene perceives life‘s real purpose as catching pleasure and
escaping from pain. Kind of a dissidence, like between Cynics and Cyrenes, later on was seen in between Stoa and
Epicureanism.
These two schools, Stoa and Epicureanism, sustained their existence by keeping alive dispute among them.
However at the same time they have some similar or common opinions. The first point they agree is that human
being is subject of philosophy. First they draw portraits of ‗superman‘. However all these ‗superman‘ concepts were
interpreted differently. For Stoa school, superman is man who beats demands and desires, knows to disregard to both
life and death. They show apathy (lack of interest or concern) as a purpose to human. On the other hand
Epicureanism finds its purpose on ataraxie (freedom from worry). However it can be noticed that there is no big
difference between ataraxie and apathy.
According to Stoa school, first principle is people‘s need to understand that they are organs of the unity
called world. Second principle is people‘s need to know their own stand in the world and so to adopt themselves to
destiny which is chosen for themselves. However Epicurus thinks that world processes with respect to blind and
spontaneous necessity. If people‘s destinty is determined by coincidences which can not be seen before, then they
can have interest in products which are their own will-power. Thus, people will stay disregard in respect to life and
death and by behaving rationally they will know how to distinguish things which give happiness.
Epicurus adopted the principle to attain pleasure and to escape from pain in ethics. But human should do
this wisely. He should avoid from intense pleasures which bring pain at last. Human should not interest in anything
more than its necessity because extremes cause to pain. People should know to keep away from temporary and

480

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

specious values such as fine and glory. Temporary values always direct people to more, and this more never ends,
that is why people always stay in unrest. Thus, people should be interested in ―moral pleasures‖ which won‘t cause
to dissatisfaction. According to Epicurus, for to be happy, it is necessary to live moderate, to tend pleasures which
are moral, and to behave in accordance with all these.
Epicurus established his school in Athens in 306 BC. In the surviving writings of Epicurus, there is little of
direct relevance to the connection between utility and justice. However, virtue including justice, was not intended to
limit pleasure. According to Bailey (Bailey 1928) Epicureanism is ‗a system of uncompromising egoistic hedonism‘.
Scarre (Scarre 1994) put it, ‗just as the Epicurean community practiced economic self-sufficiency within the walls of
its garden, the Epicurean man cultivates an inner self-sufficiency, a contentment in his own physical and mental
states and a suppression of unnecessary desires‘. The only perfect pleasure was a condition of ataraxie where one
lives quietly in bodily health and with little physical and psychological distress. (Rosen, 2003).
Although none of the pleasure is bad, some (involving less pain) are purer than others. Epicurus made
connection between pleasure and health, pain and disease. All pleasures were good in the sense that health was good,
even though some pleasures were mixed. If health was good, with disease of body or soul it becomes greatest evil.
In Epicureanism the greatest pleasure was defined by the removal of all pain, and hence the Epicurean lived
quietly and peacefully in the real or metaphorical Garden (Rosen, 2003). The important virtue for Epicurus was
prudence, and a considerable emphasis was placed on the egoistic pleasures connected with friendship. On the other
hand, little attention was given to social values and instincts.
When it comes to ‗justice‘ in Epicurus‘ system, it can be said that ‗justice‘ means achieving security from
the attacks of other people. Epicurus‘ state on a conception of justice is that; a pledge of mutual advantage to restain
men from harming one another and save them from being harmed (Epicurus,1926). At another point, Epicurus wrote
about justice in terms of being of advantage in the requirements of men‘s dealings with one another (Epicurus,
1926). Epicurus used here a Greek phrase, ‗sumpherei en tais chreiais‘, for ‗advantage in the requirements‘. Rosen
claims that Greek noun, ‗chreia‘, also possessed a range of meanings and might be translated as ‗need‘, ‗use‘ or
‗utility‘.
For all societies in which to make compacts not to harm one another is not possible, nothing was either just
or unjust. Although justice was applied potentially to all requiring, where contracts for one should not harm others,
such a justice might be applied differently in different societies and under different circumstances. Where a law,
which was previously considered, just, had no longer had usefulness or secured advantage, there was no longer just
(Epicurus, 1926). According to Alberti (Alberti 1995) ‗justice is the realization of utility by means of a contract‘. The
emphasis on utility allows for the separation of law from justice by rejecting the view found in Plato and Aristotle,
that all law is just. And emphasis on utility leads to a notion of justice which is different from nomos (legal justice)
and physis (natural justice) (Rosen, 2003).
Justice was an invention of the wise for their own good. Epicurus summed up matters with brutal directness
and claimed that the laws exist for the sake of the wise, not that they may not do wrong, but that they may not suffer
it (Bailey,1928). Law and justice were matters of convenience which the wise person devised and approved.
Epicurus had no reason to make justice a positive part of the human condition except enabling people to obtain
‗peace of soul‘ (DeWitt, 1954). ―It represented a painful burden, and in its application as punishment justice could be
extremely painful. All that could recommend it was its utility to the wise. Other members of society might have less
invested in justice, as they were not cultivating their gardens as were Epicureans, and might well gain less from rules
concerning not harming others. However, so long as they accepted the compact, they would be assisting themselves,
as well as not harming the wise in society.‖(Rosen, 2003).

3. Medieval Times Islam Philosophers
In the seventh century, translation movements from Greek to Arabic language started and in the time of
Caliph al-Mansur this movements reached its peak. The study of Islamic ethics began to take shape in the third
century of Islam‘s emerge, with the influences from Greek ethics including Stoicism, Platonism and Aristotelianism.
Al-Kindi, the first philosopher of Islam, influenced by Socrates and Diogenes the Cynic as seen in his ethical
writings. Other influences can be seen in the work of Platonists such as Abu Bakr al-Razi or Neoplatonists such as
al-Farabi, Aristotelian influences can be seen in the works of al-Farabi, Ibn Sina and Ibn Rushd.

481

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

3.1 Al-Kindi (d.873)
Abu Yusuf Ya‘qub ibn Ishaq al-Kindi (d.873) was the first philosopher of Islam and also the first author on
philosophical ethics. In Baghdad, al-Kindi was involved in the scientific movement of the translation of Greek texts
into Arabic. His starting point was Greek philosophy and he is reported by the classical bibliographers that he has a
number of ethical treatises reflecting an interest in Socratic and Cynic thought.
In al-Kindi‘s writings, the personalities of Socrates and Diogenes the Cynic are united and both emerge as
ideal instances of virtue and asceticism (Fakhry, 1998). Moreover, the Stoic idea of apatheia (freedom from passion)
and the indifference to the vicissitudes of fortune are set out in fluent terms. According to al-Kindi, the antidote of
pain is to consider that pain results either from our actions of from doing the actions of others. In the former case, it
is individual‘s duty to avoid from doing which is the cause of pain. In the latter case, averting the pain is either in our
power or it is not. If it is in our power then we certainly ought to avert it, if it is not, we should not suffer at the
prospect of injury with the hope that it might somehow be turned away. The suggestion, to avoid material
possessions as temporary acquisitions, reflects the influences of the Stoic philosophers.

3.2

Abu Bakr al-Razi (d. 925)

Another philosopher Abu Bakr al-Razi (d. 925), who is influenced by Plato, refers to Plato as ‗the master of
the philosophers and their leader‘ and to Socrates as ‗the ascetic and spritual‘ sage in his book al-Tibb al-Ruhani
(The Spiritual Physic).
A Socratic-Platonic theme which takes place in al-Razi‘s writings is the foolishness of the hedonistic life
which turns man into a slave. People‘s many of pleasures are temporary and unattainable and people are attacked by
anxiety or pain. But according to al-Razi, true philosopher will not succumb to pain, because philosopher
understands that nothing is permanent in this world. And that whatever cannot be turned away should be ignored,
since it is the product of passion and not of reason. al-Razi says in his book al-Falsafiya that: 'For reason summons us
only to what is susceptible of bringing about profit sooner or later; grief does not bring any advantage... That is why
the perfectly rational man will only follow the summons of reason ... and will never follow the summons of passion
or allow himself to be led by it or get close to it.
Like Socrates and Plato, al-Razi believes that the soul, leaving the body, will return to its original residence
in the intelligible world, after passing through an endless cycle of purifications. Death is a logical consequence of our
being human and essential part of the definition of man. However, al-Razi adds another argument which derives
from Epicurus that death is the deprivation of sensation and with his death man will be stripped of the sensations of
pleasure and pain. Thus this is a better condition than living in pain. That is why 'according to the judgment of reason
the condition of death is better than the condition of life' (Rasa‘il al-Razi al-Falsafiya).

3.3

Al-Farabi (d. 951)

Abu Nasr al-Farabi (d. 951) was known as the ―second master‖ (muallim-i sani) amongst his peers,
Aristotle being the first (muallim-i evvel). Al-Farabi was the first systematic writer on philosophical questions in
Islam. He also contributed to ethical discussions and wrote a commentary on some parts of the Nicomachean Ethics
which is translated into Arabic by Ishaq bin Hunayn.
Al-Farabi follows Aristotle in ethics like dividing the virtues into moral and intellectual (Fakhry, 1998).
According to him, moral virtues are perfections of the lustful part of the soul whereas perfections of the intellectual
part are practical reasoning, good judgement, wisdom and sound understanding. Al-Farabi also follows Aristotle‘s
arguments about justice which consists in the equitable distribution of ‗common goods‘ in the city or in the state.
Every member of city or state is entitled to share this ‗common goods‘ such as security, wealth, dignity, public
office, etc.
Al-Farabi deflects from Aristotle and other Greek phlosophers believing in the life hereafter in Qor‘anic
way. Accoring to al-Farabi, nations and citizens of cities attain happiness, worldly in here and supreme in the life
hereafter, when four human needs are met. These are; theoretical virtues, deliberative virtues, moral virtues and
practical arts. Worldly happiness is necessary for the attainment of supreme happiness in the hereafter. According to
him, happiness is the absolute good and achieving happiness is the purpose of life. Whenever the soul of the person
reaches perfection, then happiness is achieved. According to al-Farabi, if individual‘s desire for happiness is weak
and he/she has other purposes in his/her life, then the result will be evil.

482

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

Theoretical virtues, the first one of the four themes mentioned above, consist of the sciences. The purpose is
to have understanding of all the beings on these sciences. Deliberative virtues are voluntary intelligibles that vary
across time and place such as events occuring accidentally or willingly, such as disasters or war. An individual
cannot possess deliberative virtue without possessing moral virtue. A person has to have virtuous moral character
who wishes the good for himself/herself or for others. And according to al-Farabi, theoretical virtues, deliberative
virtues, moral virtues and practical arts are all inseparable.
The his famous work al-Madina al-Fadila (virtuous city), in which people cooperate and help each other for
the purpose of attaining happiness, al-Farabi reflects again his devotion to Islamic utility perception. To him, on the
other hand, non-virtuous city is the city whose people do not know happiness. Al-Farabi discussed these two cities‘
souls of the citizens and he believed that the souls of the citizens of the virtuous city are immortal. However, the
souls of the citizens of the ignorant city are mortal and their destiny is to suffer. As a result, al-Farabi believed that
political association should be directed towards the attainment of happiness.
Generally, it can be said that al-Farabi was greatly influenced by both Aristotle and Plato in his philosophy
and his concept of happiness particularly. On the other hand, his thought was framed by Islam. He selected portions
from each of these three different influences to form a complete description of happiness. So his concept of
happiness is a product of his understanding of Greek philosophy and Islam. With his concept of happiness, he
combines Plato‘s concept of the good, Aristotle‘s concept of eudaimonia (happiness) and Islamic concept of Jihad Al
nafs (struggle of the soul).
Plato and Aristotle‘s concepts were given above. On the other hand, the Islamic concept of Jihad Al nafs
means the struggle of the soul. According to Islam, God created man to achieve bliss (happiness) in the next life
through a clearly defined struggle in this life called Jihad. By the Quranic definition, ―And whosoever strives
(jahada), strives (yujahidu) only for himself. Surely Allah is self sufficient, above need of His creatures.‖ (Quran
29:6). The person who struggles with turning his inner self into a new way of living that understands the true reality
where material is only a small portion.

3.4

Ibn Sina (Avicenna) (d. 980)

Ibn Sina (Avicenna) (d. 980) is one of the foremost philosophers in the Medieval Hellenistic Islamic
tradition and one of the most important practitioners of philosophy. He exercised a strong influence over the other
Islamic philosophers and medieval Europe as well. Al-Farabi‘s successor Ibn Sina is the author of a very short tract
on ethics and he follows closely the Platonic model in psychology.
Ibn Sina speaks about the laws which are needed to be laid down as the moral habits (akhlaq) and traits
(adat) which lead to justice. He divides the soul into different parts like rational, irascible, and concupiscent which
correspond to the virtues of wisdom, courage and temperance respectively. Finally justice is the ‗summation‘ of all
these three. According to Ibn Sina, enforcement of justice within the state (with the existence of caliph) is necessary
as the sovereign of the world and God‘s vicegerent on earth. The virtues of temperance, courage, and wisdom are for
the well-being of human beings in this world. They can be followed adequately without theoretical wisdom. Ibn Sina
presents theoretical wisdom as being so important that one can attain happiness only by acquiring it as well as these
three virtues, all of which add up to justice. Ibn Sina distinguishes himself from Farabi by insisting on the possibility
of acquiring temperance, courage, and practical wisdom-or justice-without possessing theoretical wisdom. In other
words, unlike Farabi, Ibn Sina does not perceive all the virtues to be intellectual or to be grounded in sound
intellectual understanding.
Make separation between the practical virtue and the theoretical virtue does not fully account for Ibn Sina's
moral teaching. From what appears in his treatises that moral habits are directed to the liberation of the soul from the
body. Thus they serve the ultimate goal of theoretical virtue, that is the soul achieving a free perception of God and
the divine intelligences. It is not clear, however, how the moral habits lead to justice. The only explanation that
comes to mind is that insofar as some human beings center their thoughts and activities on other worldly concerns.
Ibn Sina differs here from al-Farabi too since he starts with the basic human needs and ascends from them to the
larger issue of law- giving and providing for justice. On the other hand, Al-Farabi begins by thinking about ultimate
human happiness. Ultimate human happiness is about the highest ends of human beings rather than their humblest
beginnings or it is about their noble concerns rather than about their basic needs.

483

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

3.5

Ibn Rushd (Averroes) (d. 1198)

Ibn Rushd (Averroes) (d. 1198) is regarded as one of the important Islamic philosophers. He set out to
integrate Aristotelian philosophy with the Islamic thought in the twelfth-century of Islamic Spain. He produced
commentaries on Aristotle‘s Nicomachean Ethics and also Plato‘s Republic which is relevant to his ethical theory as
well.
According to Ibn Rushd, the principal virtues correspond to the perfection of the three parts of the soul are
the rational, the irascible, and the concupiscent. Then he describes justice along Platonic lines as the ‗harmony‘ of
the three corresponding virtues of wisdom, courage and temperance. As Aristotle stated in the Nicomachean Ethics,
it has two subdivisions which are common or universal, corresponding to ‗perfect virtue‘, and particular. However,
Ibn Rushd does not identify happiness with the contemplative life, but rather with conjunction (ittisal) to the active
intellect, which the Muslim Neoplatonists had regarded as man‘s ultimate goal.
In Muslim thought, everything they need to know about moral behaviour is encapsulated in Islam.
However, Ibn Rushd argued that a distinction should be drawn between moral notions and divine commands and
here he follows an Aristotelian approach (Leaman). According to Ibn Rushd, the answer of question ―what is the
purpose of a human being?‖ is that; one of the ultimate aims is to be happy and to avoid actions which lead to
unhappiness. Moral virtue leads to happiness. If people do what they should do in accordance with their nature,
people will be able to achieve happiness. This happiness may be interpreted as a mixture of social and religious
activities or as an entirely intellectual ideal. However, neither religion nor philosophy would approve of entirely
intellectual ideal as the ultimate aim for the majority of the community. It is possible for someone that he/she would
try to live apart from the community with concentrating entirely on intellectual pursuits, but this way of living is
inferior to a life in which there is a concentration upon intellectual thought but combined with integration within the
practices of a particular society.
Ibn Rushd, inspite of working within an Islamic context, does not identify happiness and misery with some
aspect of the afterlife since he was unable to accept the traditional view of the afterlife. Here Oliver Leaman, who
has many works about Ibn Rushd, claims that without religious imagery, ordinary believers may find it difficult to
understand that our moral actions affect not only ourselves but the happiness of the whole community, not just at a
particular time or in a particular place but as a species. With our bad behaviours, we damage our own chances of
human flourishing, and this damage affects our personal opportunities for achieving happiness and maturing as
people. It is also resulting to the weakening of society. According to Leaman, while it is possibly true that the misery
of evil-doing may not follow us personally after our death, it may well follow the community. The notion of an
afterlife points to the wider terms of reference in which moral action has life.

3.6

Ibn Khaldun

Another philosopher Ibn Khaldun, who lived in 14th century, centred his economic ideas generally around
the ideas of justice, hardwork, cooperation, moderation and fairness. He emphasises Al-adl (justice) as the bedrock
of the economy, and lack of justice leads to the breakdown of the state. Some of his writings may appear as secular.
For instance; ―Civilisation and its well-being as well as business prosperity depends on production and people‘s
efforts in all directions in their own interest and profit‖ (Muqaddimah, Volume 2). However, Ibn Khaldun insisted
that man must avoid from evils, must improve himself, and must give preference to matters of the next world against
this world (Muqaddimah, Volume 1).
According to Ibn Khaldun, extravagance and luxurious living lead to the destruction of the state. ―Sedentary
people are much concerned with all kinds of pleasure. They are accustomed to luxury and success in worldly
occupations and indulgence in worldly desires. Therefore, their souls are adored with all kinds blameworthy and evil
qualities‖(Muqaddimah, Volume 1, 225).
Another issue Ibn Khaldun emphasises is cooperation. He says that ―the power of the individual human
being is not sufficient for him to obtain the food he needs through cooperation, the needs of a number of persons,
many times greater than their own number can be satisfied‖ (Muqaddimah, Volume 1, 69) ( Ibn Haldun, 1977).
Above the economic philosophy of Ibn Khaldun was mentioned shortly, but, his Muqaddimah covers a
large number of other areas about economy like money, value, market, population, growth, international trade, etc.
When dealing with micro or macro economic issues, he demonstrated perfect competence in generating theories. In
his work, Ibn Khaldun synthesises the ideas learnt from the Qur‘an and Sunnah, and from other sources which were
converted into powerful theories.

484

�2nd International Symposium on Sustainable Development, June 8-9 2010, Sarajevo

4. Conclusion
Human concepts like good and bad, right and wrong, virtue, justice, and happiness were the concern of
human civilizations through millenniums. Historically the foundations of human ethics are laid by divine revaliations
through prophets. Then Greek phlosophers and Muslim scholars contributed to the theory till the begining of the
modern times.
With the begining of the 20th century industrial society began to transform into information society and risk
and uncertainty became prior and diagnostic feature of human behaviour. And now with these changes new structure
of society is multi-dimensional, more complicated and uncertain. While this theory became accepted and is using in
economics, criticism voices started to become louder. Critics to economics‘ deductive, abstractive and pure
rationalist method focused especially on uncertainty and risk that propelled from that analysis. Among them only one
was seem as an strong alternative to expected utility theory: Prospect theory which was found by Daniel Kahneman
and Amos Tversky in (Kahneman and Tversky 1979). They dealt with utility concept from the cognitive point of
view, Daniel Kahneman and Amos Tversky‘s studies were the earliest studies of human decision-making by
cognitive psychologists. Development of the theory through modern times may be the subject of another work.

References
Alberti, A., (1995). The Epicurean Theory of Law and Justice, in A. Laks and M. Schofield (eds) Justice and Generosity, Studies
in Hellenistic Social and Political Philosophy, Cambridge: Cambridge University Press.
Bailey, C. (1928). The Greek Atomists and Epicurus, Oxford: Clarendon Pres.
DeWitt, N.,W., (1954). Epicurus and his Philosophy, Minneapolis: University of Minnesota Press.
Epicurus, (1926). Epicurus, the Extant Remains, trans. C. Bailey, Oxford: Clarendon Press.
Ibn Haldun, (1977). Mukaddime, Ankara: Onur Yayinlari.
Ibn Khaldun, (1967). An Introduction to History: The Muqaddimah, London: Routledge and Kegan Paul.
Kahneman, D., Tversky A., (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-291.
Quinton, A., (1973). Utilitarian Ethics, London: Open Court
Rosen, F. (2003). Classical Utilitarianism from Hume to Mill, New York: Routledge
Scarre, G. (1994). Epicurus as Forerunner of Utilitarianism, Utilitas, 6:219-31.

485

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23204">
                <text>252</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23205">
                <text>Comparison of Islamic, Traditional and Alternative Utility Theories</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="23206">
                <text>DEMİRSOY, Sümeyye
CAN, Mehmet</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="23207">
                <text>Decision making under uncertainty is always trying to be explained by utility  theory. And utility theory‘s roots rely on moral philosophy. Moral philosophy is  concerning concepts about good and bad, right and wrong, virtue, justice, etc. It can be say  that utilitarianism, which is a field of moral philosophy, is more directly about utility  theory. Throughout the human history, from Prophet Abraham to Greek philosophers;  Socrates, Aristotle, Epicurus, to Islam scholars al Kindi, al-Farabi, al-Razi, Ibn-i Sina, Ibni  Rushd, Ibn-i Haldun, all discussed about ethics and utility concept.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="23208">
                <text>2010-06</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="23209">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="7">
        <name>HB Economic Theory</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="2318" public="1" featured="0">
    <fileContainer>
      <file fileId="3372">
        <src>https://omeka.ibu.edu.ba/files/original/f58b35e8a7210776e674d7a7ed5166d5.pdf</src>
        <authentication>a89995abd6839c8e0f3623e4e403d800</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="18670">
                    <text>3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

http://www.wider.unu.edu/publications/working-papers/discussionpapers/2008/en_GB/dp2008-01/_files/78805634425684379/default/dp2008-01.pdf
Narula, R. &amp; Marin, A. (2005). Exploring the relationship between direct and indirect
spillovers from FDI in Argentina, Research Memoranda 024, Maastricht : MERIT,
Maastricht
Economic
Research
Institute
on
Innovation
and
Technology,
http://ideas.repec.org/p/dgr/umamer/2005024.html
Parto, S. (2008).Innovation and Economic Activity: An Institutional Analysis of the Role of
Clusters in Industrializing Economies,Journal of Economic Issues, Available at
http://www.accessmylibrary.com/coms2/summary_0286-36151980_ITM.
Porter, M. E. (1990), The Competitive Advantages of Nations, Harvard Business Review,
March-April, No:2
Porter, M. (2000). Location, Competition and Economic Development: Local Clusters in a
Global Economy, Economic Development Quarterley, 14 (1), 15-34
Raco, Mike (1999). Competition, Collaboration and the New Industrial Districts: Examining
the Institutional Turn in Local Economic Development, Urban Studies, 36 (5-6): 951-968.

Comparison of linear regression and neural network models forecasting tourist arrivals
to Turkey
Selcuk Cankurt, Abdulhamit Subasi
International Burch University, Faculty of Engineering and Information Technologies,
Francuske Revolucije bb. Ilidza, Sarajevo, 71000, Bosnia and Herzegovina.
E-mail:asubasi@ibu.edu.ba
Abstract
This paper develops statistical and machine learning methods for estimating tourist arrivals
which is one of the donnée for planning the sustainable tourism development. Tourism is
arguably one of the world's largest and fastest growing industries. Sustainable tourism
304

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

development is one of the most promising generators of the sustainable economic
development. Realistic tourism projections based on accurate tourism forecasting contribute
much for the sustainable tourism development. The challenge of the planning and developing
sustainable tourism is to see as the complex paradigm but one of the starting points is the
accurate forecasting tourist arrivals. In this study, linear regression and neural network
multilayer perceptron (MLP) implementations are considered to make multivariate tourism
forecasting for Turkey. Comparison of forecasting performances in terms of correlation
coefficient (R), relative absolute error (RAE) and root relative squared error (RRSE)
measurements shows that MLP model for regression gives a better performance.
Keywords: Tourism forecasting; Tourism demand modelling; Time series; Linear regression;
Neural networks; Multilayer perceptron; Multivariate tourism forecasting.
1.INTRODUCTION
Tourism demand forecasts are of great economic value both for the public and private sector.
Tourism products, such as unfilled airline seats, unoccupied hotel rooms, and unused
facilities, cannot be stocked because of their perishable nature (Archer, 1987). Therefore,
accurately forecasting tourism demand has great importance to the sectors concerned with
tourism, in order to accurate and efficient plans (Petropoulos, Nikolopoulos, &amp; V., 2005; Pai
&amp; Hong, 2005).
According to the World Travel &amp; Tourism Council (WTTC), travel and tourism is the biggest
industry in the world. Since 1992 tourism sector is the largest industry and has the largest
employer in the world (Aslan, Alper, Kaplan, Muhittin, Kula, &amp; Ferit, 2008).
Turkey's economy grew an average of 6.0% per year in last decade. Currently Turkey is in
16th place on the list of the largest economies of the world and the fastest growing economy
among members of the Organization for Economic Cooperation and Development (OECD).
The new goals of Turkish tourism were to establish an efficient tourism sector with high
international competitiveness while preserving and enhancing of the country’s natural and
historical environment and cultural heritage in a sustainable manner (Ministry of Culture,
2007).
The statistical methods such as linear regression are suitable for data having seasonal or trend
patterns, while artificial neural techniques are also efficient for data which are influenced by
the special case, like promotion or extreme crisis (Efendigil, Önüt, &amp; Kahraman, 2009).
One major application area of ANNs is forecasting (Gooijer &amp; J., 2006); see (Zhang, Patuwo,
&amp; Hu, 1998) and (Hippert, Pedreira, &amp; Souza, 2001). Generally the ANNs are increasingly
used to forecast demands for tourism (Law &amp; Au, 1999; Law R. , 2000). (Pattie &amp; Snyder,
1996) used a back-propagation neural network model with two hidden layers to forecast
305

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

monthly overnight stays in US national park systems. (Law &amp; Au, 1999) presented a feedforward neural network with six input and one output nodes to forecast arrivals in Hong
Kong. For more application area of ANN, see (Al-Saba &amp; El-Amin, 1999), (Beccali, Cellura,
Lo Brano, &amp; Marvuglia, 2004), (Hobbs, Helman, Jitprapaikulsarn, Konda, &amp; Maratukulam,
1998), (Sozen, Arcaklioglu, &amp; Ozkaymak, 2005), (Sabuncuoglu, 1998), (Vellido, Lisboa, &amp;
Vaughan, 1999), (Wong, Lai, &amp; Lam, 2000), (Ayata, Cam, &amp; Yıldız, 2007), (Efendigil, Önüt,
&amp; Kahraman, 2009).
According to the brief review of literature especially related to tourism demands approaches,
this study attempts to develop a multivariate linear regression model and a general regression
neural network model for forecasting the number of the tourists coming to Turkey.

2.THEORETICAL BACKGROUND
2.1.Linear regression
Multiple linear regression (MLR) attempts to model the linear relationship called the
regression function between a dependent variable and more than one independent variables as
different from simple linear models with one independent variable. The dependent variable is
sometimes also called the predictand, and the independent variables is called the predictors.
The model for multiple linear regression, given n observations, is

for i = 1,2, ... n.
value of

predictor,

coefficient on the

the intercept, also known as the bias in machine learning,

predictor,

total number of predictors,

predictand,

error.

2.2 MLP Approach
Artificial neural networks (ANNs) (also usually preferred Neural Networks NNs) are
computing structures inspired from the biological neural networks. A neural network is made
of the interconnected processing units (usually called neurons). They have the ability of
learning by adjusting the strength of the interconnections which can be achieved by altering
the values called weights through the input data (Haykin S. , 1999). Neuron sums the
weighted inputs and conveys the net input through an activation function in order to
normalize and produce a result (Jones, 2008).

306

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

The multilayer network architecture consists of an input layer, two or more hidden layers, and
one output layer. Activation function is used for both the hidden and output nodes. While the
sigmoid function can be used to squash the output of the neuron to
in the hidden
layer in order to introduce the non-linearity to NN, linear activation function must use in
output layer to predict the numerical values in the regression problems. MLP is trained with
supervised learning include the Perceptron learning algorithm, Least-Mean-Squares learning,
and Backpropagation. Backpropagation is one of the most popular approximation approaches
for training the multilayer feedforward neural networks based on the Widrow–Hoff training
rule (Bishop, 1995; Haykin S. , 1999; Aslanargun, Mammadov, Yazici, &amp; Yolacan, 2007).
3.EXPERIMENTAL RESULTS
A total of 31 models were obtained on the basis of two regression models and their
corresponding parameter selection which are three of them belong to linear regression models
and remaining 28 ones belong to MLP models. Those models were evaluated with the
validation data through three forecasting accuracy measures: correlation coefficient (R),
relative absolute error (RAE), root relative squared error (RRSE).
Three linear regression models were examined on the basis of attribute selection parameter:
none, M5 and greedy methods. It has been shown that the linear regression model with
greedy attribute selection parameter has the best accuracy when you compare with the other
linear regression models but also the worst when you compare with MLP regression models.
According to result of our linear regression model: 25 attributes don’t affect the results —
WEKA builds the regression function by considering the attributes which only statistically
contribute to the accuracy of the model (measured in
). It will not consider the attributes
that don't contribute the regression equation. So this regression model is telling us that whole
sale price of Turkey, consumer prize index of Canada, Denmark, Spain, Russia, number of
German, France, Syrian, Poland, Romanian, Norwegian, Switzerlandian visitors, Exchange
rate of Russia, Canada, Switzerland don’t affect the arrivals to Turkey. Estimated positive
values (coefficients) tell us as value of those attributes increase number of the total visitors.
Estimated negative values (coefficients) reduce the result — linear regression model is telling
us that the bigger negative value is, the lower the total coming tourist. This can be seen by the
negative coefficient in front of the variables.
Table 1 Overall performance of linear regression and MLP methods
Model

Correlation
coefficient

Relative
Root
relative
absolute error squared error

Linear Regression

0.978

18.73%

307

20.70%

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

MLP Regression

0.9874

14.17%

15.86%

Figure 1 Comparison of MLP and linear regression methods
Among the MLP regression models presented, the best forecasting accuracy was the MLP
model composed of three hidden layers with the neuron numbers of 30, 15 and 10
(abbreviated as 30-15-10). In this model the learning rate 0.03, momentum 0.8, epoch 500
values are used and backpropagation training algorithm, sigmoid activation function for
hidden nodes and unthresholded linear activation function for output node are employed. It
showed R 0.9874, RAE 14.17% and RRSE 15.86% accuracy results.
Results obtained from the experiments in this study, support the discussions in the literature
reviews topic of this paper. As seen in the table (1) apparently, machine learning MLP
regression model have better performance than statistical linear regression model.
4.CONCLUSIONS
This study presents a multivariate time-series forecasting to predict the tourism demand to
Turkey by employing linear regression and multilayer perceptron methods. The real data sets
respect to Turkey and its top ranked 24 tourism clients of the countries are used to compare
308

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

the performance of the those methods and to find out the achievement of them on forecasting
tourism demand to Turkey. Comparison of the experimental results among linear regression
and MLP demonstrated that the MLP method had better forecasting accuracy. Experimental
results showed that the MLP model can produce lower prediction error and higher prediction
accuracy and outperformed the linear regression model. According to the experiments, it can
be concluded that the tuned MLP method with the multivariate time series has enough
satisfactory to forecast the tourism demand to Turkey.
In this study, linear regression model with greedy attributes selection method and MLP
(30:15:10) models have shown better performance when compared with other corresponding
models in forecasting the number of monthly tourist arrivals to Turkey owing to the RAE and
the RRSE measures.
Unfortunately, there is no certain or systematic method to select the appropriate model. Our
studies showed that among the methods mentioned above MLP regression has better
performance but still we need numerous experiments to evaluate and find out the most
suitable MLP regression model which can be employed on the multivariate time series
forecasting.
REFERENCES
Al-Saba, T., &amp; El-Amin, I. (1999). Artificial neural networks as applied to long-term demand
forecasting. Artificial Intelligence in Engineering.
Archer, B. (1987). Demand Forecasting and Estimation. Travel, tourism, and hospitality
research. A handbook for managers and researchers pp. 77-85 .
Aslan, Alper, Kaplan, Muhittin, Kula, &amp; Ferit. (2008). Approach, International Tourism
Demand for Turkey: A Dynamic Panel Data. Munich Personal RePEc Archive MPRA Paper
No. 10601.
Aslanargun, A., Mammadov, M., Yazici, B., &amp; Yolacan, S. (2007). Comparison of ARIMA,
neural networks and hybrid models in time series: tourist arrival forecasting. Journal of
Statistical Computation and Simulation Vol. 77, No. 1, January , 29–53.
Ayata, T., Cam, E., &amp; Yıldız, O. (2007). Adaptive neuro-fuzzy inference systems (ANFIS)
application to investigate potential use of natural ventilation in new building designs in
Turkey. Energy Conversion and Management, 48, 1472–1479.
Beccali, M., Cellura, M., Lo Brano, V., &amp; Marvuglia, A. (2004). Forecasting daily urban
electric load profiles using artificial neural networks. Energy Conversion and Management.
Bishop, C. (1995). Neural Networks for Pattern Recognition.

309

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

Choy, K. L., Lee, W. B., &amp; Lo, V. (2003). Design of an intelligent supplier relationship
management system: A hybrid case based neural network approach. Expert Systems with
Applications, 24, 225–237.
Efendigil, T., Önüt, S., &amp; Kahraman, C. (2009). A decision support system for demand
forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis.
Expert Systems with Applications 36 6697–6707.
Gooijer, J. G., &amp; J., R. (2006). 25 years of time series forecasting. Hyndman International
Journal of Forecasting 22 (2006) 443– 473.
H.Witten, I., &amp; Frank, E. (2005). Data Mining: Practical Machine Learning Tools and
Techniques (Second b.). New York: Elsevier.
Haykin, S. (1999). Neural Networks: a comprehensive foundation. (Second Edition b.).
Prentice Hall.
Hippert, H. S., Pedreira, C. E., &amp; Souza, R. C. (2001). Neural networks for short-term load
forecasting: A review and evaluation. IEEE Transactions on Power Systems, 16, 44–55.
Hobbs, B. F., Helman, U., Jitprapaikulsarn, S., Konda, S., &amp; Maratukulam, D. (1998).
Artificial neural networks for short-term energy forecasting: Accuracy and economic value.
Neurocomputing, 23, 71–84.
Jones, M. T. (2008). Artificial Intelligence: A Systems Approach. INFINITY SCIENCE
PRESS LLC.
Law, R. (2000). Back-propagation learning in improving the accuracy of neural networkbased tourism demand forecasting.
Law, R., &amp; Au, N. (1999). A Neural Network Model to Forecast Japanese Demand for Travel
to Hong Kong. Tourism Management.
Mark Hall, E. F., Pfahringer, B., Reutemann, P., &amp; Witten, I. H. (2009). The WEKA Data
Mining Software: An Update; SIGKDD Explorations. 11(1).
Ministry of Culture, T. (2007). Tourism strategy of Turkey – 2023. Ankara: Republic of
Turkey of Ministry of Culture &amp; Tourism.
Pai, P.-F., &amp; Hong, W.-C. (2005). An Improved Neural Network Model in Forecasting
Arrivals. Annals of Tourism Research, Vol. 32, No. 4, pp. 1138–1141, Elsevier.
Palmer, A., Montano, J. J., &amp; Sese, A. (2006). Designing an artificial neural network for
forecasting tourism time series. Tourism Management 27 781–790.
Pattie, D., &amp; Snyder, J. (1996). Using a Neural Network to Forecast Visitor Behavior. Annals
of Tourism Research.
Petropoulos, C., Nikolopoulos, K., &amp; V., A. P. (2005). A technical analysis approach to
tourism demand forecasting. Applied Economics Letters 12, 327–333.
310

�3rd International Symposium on Sustainable Development, May 31 - June 01 2012, Sarajevo

Reinsel, G. C. (2003). Elements of multivariate time series analysis.
Rumelhart, D. E., Hinton, G. E., &amp; Williams, R. J. (1986). Learning internal representations
by error propagation. Parallel distributed processing (pp. 318–362). Cambridge, MA: MIT
Press.
Sabuncuoglu, I. (1998). Scheduling with neural networks: A review of the literature and new
research directions. Production Planning and Control, 9(1), 2–12.
Sozen, A., Arcaklioglu, E., &amp; Ozkaymak, M. (2005). Turkey’s net energy consumption.
Applied Energy, 81(2), 209–221. .
Vellido, A., Lisboa, P. J., &amp; Vaughan, J. (1999). Neural networks in business: A survey of
applications (1992–1998). Expert Systems with Applications, 17, 51–70.
Witt, S. F., &amp; Witt, C. A. (1995, 3). Forecasting tourism demand: a review of empirical
research, International Journal of Forecasting.
Wong, B. K., Lai, S. V., &amp; Lam, J. (2000). A bibliography of neural network business
applications research: 1994–1998. Computers &amp; Operations Research 27,1045–1076.
Zhang, G., Patuwo, B. E., &amp; Hu, M. Y. (1998). Forecasting with artificial networks: The state
of the art. International Journal of Forecasting, 14, 35– 62.

Informatisation of the Judiciary in BiH: Success Factors
Nedim Fisekovic, Meliha Handzic
International Burch University, Bosnia and Herzegovina

Abstract
Informatisation of the judicial system covers all aspects of information and communication
technology (ICT), including: equipping the courts with modern information technology
equipment (desktop computers, servers, and printers), setting up a local area network (LAN)
and wide area network (WAN), establishing a system of electronic mail for the judicial
institutions, developing a system for case management system (CMS and TCMS), developing
and establishing judicial web sites (web portal), computer education of all employees in the
courts and prosecution offices, internet access for all users in the judiciary and many more.
311

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="18664">
                <text>1179</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="18665">
                <text>Comparison of linear regression and neural network models forecasting tourist arrivals  to Turkey</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="18666">
                <text>Selcuk , Cankurt
Subasi, Abdulhamit</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="18667">
                <text>This paper develops statistical and machine learning methods for estimating tourist arrivals  which is one of the donnée for planning the sustainable tourism development. Tourism is  arguably one of the world's largest and fastest growing industries. Sustainable tourism development is one of the most promising generators of the sustainable economic  development. Realistic tourism projections based on accurate tourism forecasting contribute  much for the sustainable tourism development. The challenge of the planning and developing  sustainable tourism is to see as the complex paradigm but one of the starting points is the  accurate forecasting tourist arrivals. In this study, linear regression and neural network  multilayer perceptron (MLP) implementations are considered to make multivariate tourism  forecasting for Turkey. Comparison of forecasting performances in terms of correlation  coefficient (R), relative absolute error (RAE) and root relative squared error (RRSE)  measurements shows that MLP model for regression gives a better performance.  Keywords: Tourism forecasting; Tourism demand modelling; Time series; Linear regression;  Neural networks; Multilayer perceptron; Multivariate tourism forecasting.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="18668">
                <text>2012-05-31</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="18669">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="6">
        <name>H Social Sciences (General)</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="2217" public="1" featured="0">
    <fileContainer>
      <file fileId="3271">
        <src>https://omeka.ibu.edu.ba/files/original/8e11fd53365bd7b6e585e97d9f3f01ee.pdf</src>
        <authentication>ceec5c894426c2e35cd438d66a9c1e99</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="17964">
                    <text>Comparison of Machine Learning Algorithms in Recognation of Regulatory Region of
DNA

Günay Karlı, Şenol Doğan,
Faculty of Engineering and Information Technology, International Burch Universiy,
Sarajevo, BIH
E –mails: gkarli@ibu.edu.ba – gkarli@yahoo.com, sdogan@ibu.edu.ba

Keywords:
algorithms.

Data mining, machine learning, supervised learning, classification, rule-based

Abstract
Data mining has become an important and active area of research because of theoretical
challenges and practical applications associated with the problem of discovering interesting
and previously unknown knowledge from very large real world database. These databases
contain potential gold mine of valuable information, but it is beyond human ability to analyze
massive amount of data and elicit meaningful patterns by using conventional techniques. In
this study, DNA sequence was analyzed to locate promoter which is a regulatory region of
DNA located upstream of a gene, providing a control point for regulated gene transcription.
In this study, some supervised learning algorithms such as artificial neural network (ANN),
RULES-3 and newly developed keREM-IREM rule induction algorithms were used to
analyse to DNA sequence. In the experiments different option of keREM, RULES-3 and
ANN were used, and according to the empirical comparisons, the algorithms appeared to be
comparable to well-known algorithms in terms of the accuracy of the extracted rule in
classifying
unseen
data.

516

�1.INTRODUCTION
Data mining is the process of finding hidden patterns from data. It has wide range of
applications such as, predicting stock prices, identifying suspected terrorists and scientific
discovery like analysis of DNA microarray (Hanuman et al., 2009) Researchers can now
routinely investigate the biological molecular state of a cell measuring the simultaneous
expression of tens of thousands of genes using DNA microarrays (Shelke and Deshmukh,
2007). Datamining can be used in the classification of proteins by basing on its primary
structures (sequences) is presented. It contains four steps which include textmining, feature
selection, datamining and classification. The sequences of protein are collected in a file
(Mhamdi and Elloumi, 2004).
Artificial neural networks are among the newest signal-processing technologies in the
engineer's toolbox. The field is highly interdisciplinary. In engineering, neural networks serve
two important functions: as pattern classifiers and as nonlinear adaptive filters.
Well-known ANN algorithms are based on the notion of perceptron (Rosenblatt, 1962).
Perceptrons can only classify linearly separable sets of instances. If a straight line or plane
can be drawn to separate the input instances into their correct categories, input instances are
linearly separable and the perceptron will find the solution. If the instances are not linearly
separable learning will never reach a point where all instances are classified properly.
Multilayered Perceptrons (Artificial Neural Networks) have been created to try to solve this
problem (Rumelhart et al., 1986). Properly determining the size of the hidden layer is a
problem, because an underestimate of the number of neurons can lead to poor approximation
and generalization capabilities, while excessive nodes can result in overfitting and eventually
make the search for the global optimum more difficult. An excellent argument regarding this
topic can be found in (Camargo and Yoneyama, 2001). Kon &amp; Plaskota also studied the
minimum amount of neurons and the number of instances necessary to program a given task
into feedforward neural networks (Kon and Plaskota, 2000). There are several algorithms
with which a network can be trained (Neocleous and Schizas, 2002). However, the most wellknown and widely used learning algorithm to estimate the values of the weights is the Back
Propagation (BP) algorithm. Feed-forward neural networks are usually trained by the original
back propagation algorithm or by some variant. Their greatest problem is that they are too
slow for most applications. One of the approaches to speed up the training rate is to estimate
optimal initial weights (Yam and Chow, 2001). Another method for training multilayered
feedforward ANNs is Weight-elimination algorithm that automatically derives the
appropriate topology and therefore avoids also the problems with overfitting (Weigend et al.,
1991). Genetic algorithms have been used to train the weights of neural networks (Siddique
and Tokhi, 2001) and to find the architecture of neural networks (Yen and Lu, 2000). There
are also Bayesian methods in existence which attempt to train neural networks. Vivarelli &amp;
Williams compare two Bayesian methods for training neural networks (Vivarelli and
Williams, 2001).
517

�In recent years, there has been a growing amount of research on inductive learning. In its
broadest sense, induction (or inductive inference, supervised learning) is a method of moving
from the particular to the general - from specific examples to general rules (Quinlan, 1986).
Induction can be considered the process of generalizing a procedural description from
presented or observed examples. The purpose of inductive learning is to perform a synthesis
of new knowledge, and this is independent of the form given to the input information.
RIPPER is a well-known rule-based supervised learning algorithm (Cohen, 1995). It forms
rules through a process of repeated growing and pruning. Other fundamental learning
classifiers based on decision rules include the AQ family (Michalski and Chilausky, 1980)
and CN2 (Clark and Niblett, 1989). Bonarini gave an overview of fuzzy rule-based classifiers
(Bonarini, 2000). Fuzzy logic tries to improve classification and decision support systems by
allowing the use of overlapping class definitions. Furnkranz (2001) investigated the use of
round robin binarization (or pairwise classification) as a technique for handling multi-class
problems with separate and conquer rule learning algorithms. The PART (Frank and Witten,
1998) algorithm infers rules by repeatedly generating partial decision trees, thus combining
the two major paradigms for rule generation − creating rules from decision trees and the
separate-and-conquer rule learning technique. RULES family algorithms (Aksoy, 1993)
obtain the IF-THEN rules from a given set of examples. REX-1 (Akgöbek et al., 2006) uses
the entropy value to give a greater priority to the attributes with higher importance and obtain
more general rules.
Segments of genome coding for messenger ribonucleic acids (mRNAs), transfer ribonucleic
acids (tRNAs), ribosomal ribonucleic acids (rRNAs) are called genes. Among these mRNAs
determine the sequence of amino acids in proteins. The mechanism is simple for the
prokaryotic cell where all the genes are converted into the corresponding mRNA (messenger
ribonucleic acid) and then into proteins.
Genome analysis (Gene finding) typically refers to the area of computational biology that is
concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that
are biologically functional. This especially includes protein-coding genes, but may also
include other functional elements such as RNA genes and regulatory regions. Gene finding is
one of the first and most important steps in understanding the genome of a species.
Computational Gene prediction is relatively simple for the prokaryotes where all the genes
are converted into the corresponding mRNA and then into proteins. The process is more
complex for eukaryotic cells where the coding DNA sequence is interrupted by random
sequences called introns.
Some of the questions which biologists want to answer today are (Jayaram and Bhushan,
2000).:
Given a DNA sequence, what part of it codes for a protein and what part of it is junk DNA.

518

�Classify the junk DNA as intron, untranslated region, transposes, dead genes, regulatory
elements etc.
Divide a newly sequenced genome into the genes (coding) and the non-coding regions.
In this study, short sequence of DNA is used as an example set to train the keREM, ANN and
RULES-3. The features of the DNA sequence are the nucleotides (a,g,c,t). The learning
system is requested to generate a classifier that identifies these sequences whether or not they
are in the one of functional DNA regions (coding regions).
2.INTRODUCTION TO GENE
A gene is a segment of nucleic acid that contains the information necessary to produce a
functional product, usually a protein. Genes consist of a long strand of DNA (RNA in some
viruses) that contains a promoter, which controls the activity of a gene, and a coding
sequence, which determines what the gene produces.
The genes are made up of a coding alphabet of 4 nucleotides made up of 4 bases:
Adenine(A), Thymine (T), Guanine (G) and Cytosine (C).
The bases Adenine (A) and Guanine (G) are Purines; while Thymine (T) and Cytosine (C) are
Pyrimidines.
2.1. Universal Genetic Code
There four bases Adenine (A), Thymine (T), Guanine (G) and Cytosine (C) As there are 20
amino acids if we use 2 codons for an amino acid we will short of the representation as 42 =
16. So we use three codons to represent all the 20 amino acids as 43=64. As there are only 20
amino acids and 64 codon representation most of the codon are degenrative. 'ATG' is the start
codon and TAG, TGA, TAA are stop codons usually.
2.2. Genome Organisation
Genome organization refers to the sequential, not the structural organization of the genome.
Besides the coding exons, the non-coding DNA in Eukaryotes may fall in the following
classes.
Introns: They are DNA sequences inserted between the exons and found in the ORF (Open
Reading Frames). They are spliced after the first level of transcription. Most introns are junk
inserted within genes. Pseudogenes. 'Dead', non-functional copies of genes present elsewhere
in the genome, but no longer of any use.
519

�Retropseudogenes: Like pseudogenes, but have been processed, i.e. lack introns produced by
the action of reverse transcriptase (RT) on mRNA, and subsequent incorporation of the
cDNA into the genome.
Transposons: Jumping genes, which splice
themselves in and out of the genome (in DNA
form) randomly, by the action of transposase.
Retrotransposons: Transcribed into an mRNA,
which encodes an RT enzyme, which then
copies the mRNA back to DNA and
incorporates it into the genome.
In fact in humans only 1.5% of the entire genome length corresponds to coding DNA. This
1.5% codes for about 27,000 genes, which in turn code for proteins that are responsible for all
the cellular processes
2.3. What are Promoters?
A promoter is a regulatory region of DNA located upstream (towards the 5' region) of a gene,
providing a control point for regulated gene transcription.
3 . ARTIFICIAL NEURAL NETWORK (ANN)
Information processing paradigm that is inspired by the way biological nervous systems, such
as the rain, process information. The key element of this paradigm is the novel structure of
the information processing system. It is composed of a large number of highly interconnected
processing elements (neurones) working in unison to solve specific problems. ANNs, like
people, learn by example. An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process. Learning in biological systems
involves adjustments to the synaptic connections that exist between the neurones. This is true
of ANNs as well (Domingos, 1995).
There are different types of neural networks, which can be distinguished on the basis of their
structure and directions of signal flow. Each kind of neural network has its own method of
training. Generally, neural networks may be differentiated as follows.
feedforward networks (one-layer networks and multi-layer networks )
recurrent networks
cellular networks
Principles of training multi-layer neural network use backpropagation.
520

�The project describes teaching process of multi-layer neural network employing
backpropagation algorithm. To illustrate this process the three layer neural network with two
inputs and one output, which is shown in the
picture below, is used:
Figure 1: Multi-Layer Neural Network
Each neuron is composed of two units. First unit
adds products of weights coefficients and input
signals. The second unit realise nonlinear
function, called neuron activation function.
Signal e is adder output signal, and y = f(e) is
output signal of nonlinear element. Signal y is
also output signal of neuron.
Figure 2: Teaching Process of Multi-Layer NN
To teach the neural network we need training data set. The training data set consists of input
signals (x1 and x2) assigned with corresponding target (desired output) z. The network
training is an iterative process. In each iteration weights, coefficients of nodes are modified
using new data from training data set. Modification is calculated using algorithm described
below: Each teaching step starts with forcing both input signals from training set. After this
stage we can determine output signals values for each neuron in each network layer.
4.INDUCTIVE LEARNING
Machine learning algorithms automatically builds a classifier by learning the characteristics
of the categories from a set of classified documents, and then uses the classifier to classify
documents into predefined categories. (Khan et al., 2010). In recent years, there has been a
growing amount of research on inductive learning. In its broadest sense, induction (or
inductive inference) is a method of moving from the particular to the general from specific
examples to general rules. Michalski explains inductive learning as:
Induction can be considered the process of generalizing a procedural description from
presented or observed examples
The purpose of inductive learning is to perform a synthesis of new knowledge, and this is
independent of the form given to the input information.
Inductive learning includes learning from examples and learning from observation and
discovery. In order to form a knowledge base using inductive learning, the first task is to
collect a set of representative examples of expert decisions. Each example belongs to a
known class (for example, + or -) and is described in terms of a number of attributes, (for
example "hair" or "eyes"). These examples may be specified by an expert as a good tutorial
set, or may come from some neutral source such as an archive. The induction process will
521

�attempt to find a method of classifying an example, again expressed as a function of the
attributes that explains the training examples and that may also be used to classify previously
unseen cases. (Quinlan, 1993).. The outcome of an induction algorithm is either a decision
tree or a set of rules. (Quinlan, 1986)..
4.1. RULES-3 Inductive Learning Algorithm
RULES-3 (Aksoy, 1993) is a simple algorithm for extracting a set of classification rules from
a collection of examples for objects belonging to one of a number of known classes. An
object must be described in terms of a fixed set of attributes, each with its own range of
possible values, which could be nominal or numerical. For example, attribute "length" might
have nominal values {short, medium, long} or numerical values in the range {-10, 10}.
An attribute-value pair constitutes a condition in a rule. If the number of attributes is Na, a
rule may contain between one and Na conditions. Only conjunction of conditions is permitted
in a rule and therefore the attributes must be all different if the rule comprises more than one
condition.
This algorithm can be summarized as follows:
Step1. Define ranges for the attributes, which have numerical values and assign labels to
those ranges.
Step2. Set the minimum number of conditions (Ncmin) for each rule.
Step3. Take an unclassified example.
Step4. Nc=Ncmin-l
Step5. If Nc&lt;Na then Nc=Nc+i
Step6. Take all values or labels contained in the example.
Step7. Form objects which are combinations of Nc values or labels taken from the values or
labels obtained in Step6.
Step8. If at least one of the objects belongs to a unique class then form rules with those
objects; ELSE go to Step5.
Step9. Select the rule, which classifies the highest number of examples.
Step10. Remove examples classified by the selected rule.
Step11. If there are no more unclassified examples, then STOP; ELSE go to Step3. Here Nc
is the number of condition(s) for each rule and Na is the number of attributes for each
example.

522

�4.2. keREM Inductive Learning Algorithm
In this section, an algorithm devised for Inductive Learning, newly developed keREM
(Inductive Rule Extraction Method) is introduced, which was developed to obtain the IFTHEN rules from a given set of examples. It discards the pitfalls encountered in the some
inductive learning algorithms. It uses the value of gain function, to give a greater priority to
the attributes with higher importance and obtain rules that are more general.
The algorithm can be summarized as follows:
Step1. In a given training set, probability distribution and class distribution rate of the each
attribute-value pairs is computed.
Step2. Power of classification is computed for each attribute in the data set.
Step3. Class-based Gain of the each attribute-value pairs is calculated by using computed
probability distributions, class distribution rate and power of classification.
Step4. Any value of which probability distributions one for n=1 can be selected as a rule. The
attribute-values are converted into rules. The classified examples are marked.
Step5. Go to step8.
Step6. Beginning from the first unclassified example, combinations with n values are formed
by taking the attribute-values whose gain is bigger.
Step7. Each combination is applied to all of the examples in the set of examples. From the
values composed of n combinations, those matching with only on class are converted into a
rule. The classified examples are marked.
Step8. If all of the examples in the training set are classified then go to step11.
Step9. Perform n=n+1 expression.
Step10. If n&lt;N the go to step6
Step11 if there is more than one rule representing the same examples, the most general one is
selected.
Step12. End.
4.3. IREM Inductive Learning Algorithm
Newly developed IREM (Inductive Rule Extraction Method) is introduced In this section,
which was developed to obtain the IF-THEN rules from a given set of examples. It uses the
class-based entropy value, to give a greater priority to the attributes with higher importance
and obtain rules that are more general.
The algorithm can be summarized as follows:
523

�Step1. In a given training set, probability distribution of the each attribute-value pairs is
computed.
Step2. The entropy is computed for each attribute and value.
Step3. By using computed probability distributions and entropy, class-based entropy is
calculated.
Step4. Any value of which class-based entropy equals zero for n=1 can be selected as a rule.
The values are converted into rules. The classified examples are marked.
Step5. Go to step8.
Step6. Beginning from the first unclassified example, combinations with n values are formed
by taking the value of the attributes whose class-based entropy is smaller.
Step7. Each combination is applied to all of the examples in the set of examples. From the
values composed of n combinations, those matching with only on class are converted into a
rule. The classified examples are marked.
Step8. If all of the examples in the training set are classified then go to step11.
Step9. Perform n=n+1 expression.
Step10. If n&lt;N the go to step6
Step11 if there is more than one rule representing the same examples, the most general one is
selected.
Step12. End.
5. EXPERIMENTS
Using ANN and RULES-3 systems, experiments were performed for the classification of
promoter DNA region. For this purpose promoter data-sets available in the University of
California-Irvine’s Repository of Machine Leraning Databases were used (Merz and Murphy,
1996).
5.1. Promoter Recognition Experiments with ANN
Using different option of ANN, 30 sets of experiment were performed on the promoter data
set of E.coli DNA. 86 (approximately %80 of the data set.) randomly selected instances of
the original data set were used as a training data in the experiments (Nayır and Karlı, 2009).
In order to built, train and test an ANN to recognize promoter region of the DNA, the above
Matlab code was written. By changing “newff()” function’s arguments, some ANN with
different number of hidden layer and neurons were designed and tested. Using the code
above, an ANN with two hidden layers, the first hidden layer with neuron number of 10 and
524

�the second hidden layer with neuron number of 5 was designed. Activation functions of the
input layer and hidden layers were ‘tansig’. And activation function of the output layer was
‘purelin’. Minimum error rate (%95) was gained in this option of the ANN. The following
table represents error rate of the ANN with one, two and three hidden layers in our
experiments.

Figure 3: Performance of the NN at 100 epochs.

Figure 4: Performance of the NN at 694
epochs.
Table 1: Experiment results of ANN with one hidden layer.
525

�No of
set of Layer
Exp.

1

2

3

The
numb
Epoc
er of
h
neuro
n

Input

10

1.Hidde
n

5

Output

1

Input

50

1.Hidde
n

25

Output

1

Input

100

1.Hidde
n

50

Output

1

Min.
error
rate

947

35

244

25

1404

20

Table 2: Experiment results of ANN with two hidden layer
No of
set of Layer
Exp.

The
Min.
number
Epoch error
of
rate
neuron

Input

25

1.Hidden

10

2.Hidden

5

Output

1

1

526

694

5

�2

Input

50

1.Hidden

25

2.Hidden

10

Output

1

Input

100

1.Hidden

50

2.Hidden

25

Output

1

3

851

25

554

10

Table 3: Experiment results of ANN with three hidden layer.
No of
set of Layer
Exp.

1

2

527

The
numb
Epoc
er of
h
neuro
n

Input

25

1.Hidde
n

10

2.Hidde
n

5

3.Hidde
n

3

Output

1

Input

50

1.Hidde
n

25

Min.
error
rate

785

45

978

20

�3

2.Hidde
n

10

3.Hidde
n

5

Output

1

Input

100

1.Hidde
n

50

2.Hidde
n

25

3.Hidde
n

10

Output

1

889

30

5.2. Promoter Recognition Experiments with RULES-3
Using different values of number of condition, three sets of experiments were performed on
the promoter data set. 40 randomly selected instances of the original data set were used as a
training data in the experiments.
When Number of condition was set to 1, rules were produced (Karlı, 2000).
Using the extracted rule, 11 instances could not be recognized. So accuracy on test data was
90,1.
Using different options, three sets of experiments were performed on the promoter data set.
As expected, Rules-3 algorithm produced a rule set that classified all training examples
correctly. One important conclusion may be driven from table 4.3 is that while number of
condition was decreased, extracted rule number decreased. However, accuracy on test data
increased. The highest accuracy on test data was gained when number of condition was equal
to 1.
Table 4: Results for promoter data set with different values of Nc.

528

�Numb Numbe Numbe Accurac Accurac
er of r
y
y
of r of
conditi exampl extract on
on test
es
on
ed
training data(%)
rules data(%)
3
40
38
100
61,8

5.3

2

40

25

100

76,4

1

40

18

100

90,1

Promoter Recognition Experiments with keREM

The most important features of keREM algorithm is that it can compute class-based gain of
each attribute-value in a given training set. In this context, first, probability distribution, class
distribution rate and power of classification of each nucleotide forming DNA sequence were
computed in terms of promoter and non-promoter classes. In the next step,
Class-based Gain was computed for each value in the DNA sequence data set by using
computed probability distributions, class distribution rate and power of classification. In this
way, rules produced by the algorithm were formed by attribute-value whose information
value is maximum. The rule set constructed by the method was applied to DNA test set. And
the error rate was satisfactory, %97.17
5.4. Promoter Recognition Experiments with IREM
The most important features of IREM algorithm is that it can compute class-based entropy of
each attribute-value in a given training set. In this context, first, probability distributions of
each nucleotide forming DNA sequence were computed in terms of promoter and nonpromoter classes. In the next step, entropy of training set was found. But, the entropy does not
contain class information for the value-attribute pairs. Thus, using the entropy of the training
set and the probability distributions of the attribute-value, class-based entropy was computed
for each value in the DNA sequence data set. In this way, rules produced by the algorithm
were formed by attribute-value whose information value is maximum. The rule set
constructed by the method was applied to DNA test set. And the error rate was satisfactory,
%98.1

529

�6.CONCLUSION
Although from a biochemical view point DNA is a complex molecule, from a computer
science view point DNA can be considered a very long string over four alphabets A, C, G T.
One of the most important step in analysis of a new DNA sequence is finding out whether or
not it contains any genes, and if so, determining exactly where they are. For locating
functional region in newly sequenced DNA data keREM, IREM, artificial neural network
(ANN) and RULES-3 can be used in such a way that known region in mapped sequences is
given as input to the systems. Then, the output classifiers of the keREM, IREM, ANN and
RULES-3 are used to locate functional regions of newly sequenced data. In this study,
keREM, IREM, ANN and RULES-3 were used to locate promoter region of DNA data.
Some portions of DNA serve as protein coding regions. However, some portions serve as
regulatory markers for the processes that convert coding regions into protein. One of these
regulatory regions is a promoter that occurs before coding regions to signal where the
transcription process begins. To recognize promoter region, some sort of experiments were
performed by using different options of keREM, IREM, ANN, different numbers of hidden
layers and neurons, and RULES-3,
In the first sort of experiments, one hidden layer with the number of neuron ranging from 5 to
50 was used in ANN. In these experiments, minimum error rate was 15. In the second sort of
experiments, two hidden layers were used, the first hidden layer with the number of neuron
ranging from10 to 50 and the second hidden layer with the number of neuron ranging from 5
to 25. And minimum error rate of these experiments was 5. This was the best result gained
from the experiments. In the last sort of experiments three hidden layers were used, the first
hidden layer with the number of neuron ranging from 10 to 50, the second hidden layer with
the number of neuron ranging from 5 to 25 and the last hidden layer with the number of
neuron ranging from 3 to 10. And minimum error rate was 20.
Using different options of RULES-3, three sets of experiments were performed on the
promoter data set. As expected, Rules-3 algorithm produced a rule set that classified all
training examples correctly. The highest accuracy on test data was gained when number of
condition was equal to 1.
It is observed that, rules formed by IREM were more general than that of keREM, RULES-3
and ANN. Only 2 examples could not be recognized out of 106 example test set. As a result,
it was determined that the error rate of the IREM was lower than the error rate of the keREM,
RULES-3 and ANN for DNA sequence test set.
Table 5: The errors of some machine learning algorithms on promoter data set.
System

Errors

Comments

REX-1

0/106

Inductive L.A

530

�ILA

0/106

Inductive L.A

IREM

2/106

Class-based entropy

keREM

3/106

Class-based gain

KBANN

4/106

A hybrid ML system

ANN

6/106

ANN
with
hidden layer

BP

8/106

Standard
backpropagation with
one layer

RULES3

11/106

Nc=1

O'Neill

12/106

Ad hoc tech. from the
bio. lit.

NearNeigh

13/106

A nearest neighbours
algorithm

ID3

19/106

Quinlan's
builder

ANN

21/106

ANN with
hidden l.

two

decision
three

As it may be driven from table 5, using the same data set, error rate of accuracy on the test
data of the REX and ILA is 0/106, which is the minimum error rate. The second best result
belongs to IREM with the error rate 2/106.
REFERENCES
Quinlan, R. (1986). Induction Decision Tree. Machine Learning, 1, 81-106.
Jayaram, P., Bhushan, K. (2000). Bioinformatics For Better Tomorrowç. Indian Institute of
Technology, Hauz Khas: New Delhi.
Domingos, P. (1995). Rule Induction and Instance Based Learning. IJCAI-95.
531

�Clark, P., Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning, 3, 261-283.
Aksoy, M. S. (1993). New Algorithms for Machine Learning, University of Wales: Cardiff.
Merz, C. J., Murphy, P. M. (1996). UCI Repository of Machine Learning Database, Retrieved
April 1, 2010, from http://www.ics.uci.edu/~mlearning/MLlearning.
Nayır, A., Karlı, G. (2009). Application of Artificial Neural Network (ANN) to DNA
Sequence Analysis, In AICT 2009, The 3rd IEEE International Conference on Application of
Information and Communication Technologies.
Karlı, G., (2000). Application of Rule Induction Algorithms to DNA Sequence Analysis.
Fatih University: İstanbul.
Akgöbek, Ö., Aydın, Y. S., Aksoy, M. S. (2006). A new algorithm for automatic knowledge
acquisition in inductive learning. Knowledge-based Systems, 19, 388-395.
Shelke, RR., Deshmukh, V. M. (2007). Computational analysis of DNA microarray data
using data mining. Biosciences Biotechnology Research Asia, 4, 321-324.
Mhamdi, F., Elloumi, M., Rakotomalala, R. (2004). Textmining feature selection and
datamining for proteins classification. In ICTTA 2004, International Conference on
Information and Communication Technologies: From Theory to Applications, 457-458.
Hanuman, T., Raghava, M., Siva, A., Mrithyunjaya, K , Chandra, V. (2009). Performance
Comparative in Classification Algorithms Using Real Datasets. Comput Sci Syst Biol, 2, 97100.
Khan, A., Baharudin, B., Lee, L. H, Khan, K, A. (2010). Review of Machine Learning
Algorithms for Text-Documents Classification, Journal Of Advances In Information
Technology.
Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan: New York.
Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986). Learning Internal Representations by
Error Propagation. In: Rumelhart D E, McClelland J L et al. (eds.) Parallel Distributed
Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA, 1,
318-362.
Camargo, L. S. &amp; Yoneyama, T. (2001). Specification of Training Sets and the Number of
Hidden Neurons for Multilayer Perceptrons. Neural Computation 13: 2673–2680.
Kon, M. &amp; Plaskota, L. (2000), Information complexity of neural networks, Neural Networks
13: 365–375.
Neocleous, C. &amp; Schizas, C., (2002), Artificial Neural Network Learning: A Comparative
Review, LNAI 2308, pp. 300–313, Springer-Verlag Berlin Heidelberg.
532

�Yam, J. &amp; Chow, W. (2001). Feedforward Networks Training Speed Enhancement by
Optimal Initialization of the Synaptic Coefficients. IEEE Transactions on Neural Networks,
12, 430-434.
Weigend, A. S., Rumelhart, D. E., &amp; Huberman, B. A. (1991). Generalization by weightelimination with application to forecasting. In: R. P. Lippmann, J. Moody, &amp; D. S. Touretzky
(eds.), Advances in Neural Information Processing Systems 3, San Mateo, CA: Morgan
Kaufmann.
Siddique, M. N. H. and Tokhi, M. O. (2001), Training Neural Networks: Backpropagation vs.
Genetic Algorithms, IEEE International Joint Conference on Neural Networks, 4, 2673–2678.
Yen, G. G. and Lu, H. (2000), Hierarchical genetic algorithm based neural network design,
IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, 168–
175.
Vivarelli, F. &amp; Williams, C. (2001). Comparing Bayesian neural network algorithms for
classifying segmented outdoor images. Neural Networks, 14, 427-437.
W. W. Cohen. (1995). Learning to classify English text with ILP methods. In Luc De Raedt,
editor, Advances in inductive logic programming, 124–143. IOS Press, Amsterdam, NL.
Michalski, R. S., Chilausky, R. L. (1980). Learning by being told and learning from
examples: an experimental comparison of the two methods of knowledge acquisition in the
context of developing and expert system for soybean disease diagnosis. Policy Analysis and
Information Systems, 4.
Bonarini, A. (2000), An Introduction to Learning Fuzzy Classifier Systems. Lecture Notes in
Computer Science, 1813, 83-92.
Furnkranz, J. (1997). Pruning algorithms for rule learning. Machine Learning, 27, 139-171.
Frank, E. &amp; Witten, I. (1998). Generating Accurate Rule Sets Without Global Optimization.
In Shavlik, J., (eds), Machine Learning: Proceedings of the Fifteenth International
Conference, Morgan Kaufmann Publishers.
Pham, D.T., Dimov, S.S. (1997). The RULES-4 incremental inductive learning algorithm, in:
R.A. Adey, G. Rzevski, R. Teti (Eds.), Applications of Artificial Intelligence in Engineering
XII. Computational Mechanics Publications.

533

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="17958">
                <text>1211</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="17959">
                <text>Comparison of Machine Learning Algorithms in Recognation of Regulatory Region of  DNA</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="17960">
                <text>Gunay, Karli</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="17961">
                <text>Keywords: Data mining, machine learning, supervised learning, classification, rule-based  algorithms.  Abstract  Data mining has become an important and active area of research because of theoretical  challenges and practical applications associated with the problem of discovering interesting  and previously unknown knowledge from very large real world database. These databases  contain potential gold mine of valuable information, but it is beyond human ability to analyze  massive amount of data and elicit meaningful patterns by using conventional techniques. In  this study, DNA sequence was analyzed to locate promoter which is a regulatory region of  DNA located upstream of a gene, providing a control point for regulated gene transcription.  In this study, some supervised learning algorithms such as artificial neural network (ANN),  RULES-3 and newly developed keREM-IREM rule induction algorithms were used to  analyse to DNA sequence. In the experiments different option of keREM, RULES-3 and  ANN were used, and according to the empirical comparisons, the algorithms appeared to be  comparable to well-known algorithms in terms of the accuracy of the extracted rule in  classifying unseen data.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="17962">
                <text>2012-05-31</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="17963">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="88">
        <name>H Social Sciences (General),T Technology (General)</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="247" public="1" featured="0">
    <fileContainer>
      <file fileId="245">
        <src>https://omeka.ibu.edu.ba/files/original/49c638afaab4f7884c5f976f85b9cc75.pdf</src>
        <authentication>aa2806ee70d36f9ce713776c1ae3f233</authentication>
        <elementSetContainer>
          <elementSet elementSetId="4">
            <name>PDF Text</name>
            <description/>
            <elementContainer>
              <element elementId="52">
                <name>Text</name>
                <description/>
                <elementTextContainer>
                  <elementText elementTextId="1868">
                    <text>COMPARISON OF MACHINE LEARNING TECHNIQUES
IN PHISHING WEBSITE CLASSIFICATION
Adnan Hodžić
International Burch University
Bosnia and Herzegovina
adnan.hodzic@ibu.edu.ba
Jasmin Kevrić
International Burch University
Bosnia and Herzegovina
jasmin.kevric@ibu.edu.ba
Adem Karadag
Turkey
nuhadem@gmail.com
Abstract: Phishing is one among the luring strategies utilized by phishing artist in
the aim of abusing the personal details of unsuspected clients. Phishing website
is a counterfeit website with similar appearance, but changed destination. The
unsuspected client post their information thinking that these websites originate from
trusted financial institutions. New antiphishing techniques rise continuously, yet
phishers come with new strategy by breaking all the antiphishing mechanisms.
Hence there is a need for productive mechanism for the prediction of phishing
website. This paper described comparison in classification of phishing websites using
different Machine­learning algorithms. Random Forest (RF), C4.5, REP Tree, Decision
Stump, Hoeffding Tree, Rotation Forest and MLP were used to determine which
method provides the best results in phishing websites classification. All instances are
categorized as 1 for “Legitimate”, 0 for “Suspicious” and ­1 for “Phishy”. Results show
that RF with REP Tree show the best performance on this dataset for classification of
phishing websites.
Keywords: Machine Learning, Phishing Websites
Introduction
Internet is not only significant for individual users but also for online business organizations.
These organizations usually offer online trading(Liu &amp; Ye, 2003). Nevertheless, Internet­
users can be prone to different types of web­threats that can make financial
damages, identity theft, loss of private information, brand reputation damage and loss
of client’s trust in e­commerce and online banking. Therefore, Internet appropriateness
for commercial sales becomes doubtful.
Phishing websites is a semantic intrusion which targets the user instead of computer.
It is a fairly new Internet crime when compared to other forms, such as virus and
hacking. The phishing problem is a tough problem due to the fact that it is extremely
easy for an attacker to make a replica of a good website, which looks very authentic
to users.

ICESoS 2016 - Proceedings Book 249

�International Conference on Economic and Social Studies (ICESoS’16)
Phishing attacks usually aim to acquire confidential information like usernames,
passwords and financial IDs by tricking users. Phishing attacks typically start by sending
an email that appears to come from authentic company to victims requesting them
to update or validate their information by visiting a link within the email.
The idea is that bait is dropped out hoping that a user will take it and bite into it just
like the fish. Usually, bait is an instant messaging website or an e­mail, which will take the
user to hostile phishing websites(James, 2005).
The motivation behind this study is to make a strong and effective technique which
uses Data Mining algorithms and mechanisms to detect phishing websites. Associative
and classification algorithms can be very helpful in identifying Phishing websites. It
can give us answers about what the most important phishing website features and
indicators are and how they link to each other. Comparing between various Data
Mining classification and association systems and techniques is also a goal of this
study since there are only few investigations that compares different data mining
methods in predicting phishing websites.
Literature Review
Numerous methodologies are being implemented at present to classify phishing
websites.(Aburrous, Alamgir, Keshav, &amp; Fadi, 2009) suggests a method for intelligent
phishing detection using fuzzy data mining. In this study, e­banking phishing website
detection degree is achieved based on six attributes: URL &amp; Domain Identity, Security
and Encryption, Source Code and Java script, Page Style and Contents, Web Address
Bar, and Social Human Factor. Fuzzy logic and data mining algorithms are applied to
classify e­banking phishing websites.
(Basnet, Ram, Srinivas, &amp; Sung, 2008) adopts machine learning way for identifying
phishing attacks. Support vector machine, biased support vector machine and neural
network are used for the effective prediction of phishing e­mails. The objective of this
study is to classify phishing emails by combining basic features in phishing emails and
utilizing several machine learning algorithms for the classification process.
(Mohammad, Fadi, &amp; Lee, 2013) suggested an intelligent prototype for predicting
phishing attacks based on Artificial Neural Network. Same authors shed light on the
key features that classify phishing websites from real ones and evaluate how good
rule­based data mining classification methods are in detecting phishing websites and
which classification approach is proven to be more reliable (Mohammad, Lee, &amp; Fadi,
2014).
Methodology
● Dataset
Dataset used for the research is “Phishing Websites Data Set” (“UCI Machine Learning
Repository: Phishing Websites Data Set,” 2016). This dataset was gathered mainly
from: PhishTank archive, MillerSmiles archive, Google’s searching operators.

250 ICESoS 2016 - Proceedings Book

�Regional Economic Development: Entrepreneurship and Innovation
The authors shed light on the key features that have been proven to be solid and efficient
in predicting phishing websites while proposing some new features, experimentally
assigning new rules to some well­known features and updating some other features.
The dataset is divided into 3 parts, training set and 2 test sets. The training set has 11055
and test sets have 2456 and 2670 instances. All instances are categorized as 1 for
“Legitimate”, 0 for “Suspicious” and 1
­ for “Phishy”.
Dataset phishing criteria is divided into 4 sections (Address Bar based Features,
Abnormal Based Features, HTML and JavaScript based Features and Domain
based Features) and it has 30 attributes.
Table 1: Phishing features
Features group

Features Factor Indicator
Using the IP Address
Long URL to Hide the Suspicious Part
Using URL Shortening Services “TinyURL”
URL’s having “@” Symbol
Redirecting using “//”
Adding Prefix or Suffix Separated by (­) to the Domain

Address Bar based Features

Sub Domain and Multi Sub Domains
HTTPS (Hyper Text Transfer Protocol with Secure Sockets
Layer)
Domain Registration Length
F avicon
Using Non­Standard Port
The Existence of “HTTPS” Token in the Domain Part of the
URL
Request URL
URL of Anchor
Links in &lt;Meta&gt;, &lt;Script&gt; and &lt;Link&gt; tags

Abnormal Based Features

Server Form Handler (SFH)
Submitting Information to Email
Abnormal URL
Website Forwarding

HTML and JavaScript based Features

Status Bar Customization
Disabling Right Click
Using Pop­up Window
IFrame Redirection
Age of Domain
DNS Record
Website Traffic

Domain based Features

PageRank
Google Index
Number of Links Pointing to Page
Statistical­Reports Based Feature

ICESoS 2016 - Proceedings Book 251

�International Conference on Economic and Social Studies (ICESoS’16)
● Algorithms
Several different machine learning algorithms were used for experiments.
1. Multilayer Perceptron (MLP)
Multilayer Perceptron is the most frequently used neural network classifier. MLP is a
neural network and a neural network can be described as an artificial neural network
which consists of a huge number of interconnected processing components known as
neurons that act as a microprocessor. It is a mathematical model for classification
of non­linear data into distinct classes. Multilayer Perceptron is the most popular and
frequently used neural network design (Bishop, 1995). The MLP is feed­forward network
architecture which involves two layers with one or more than one hidden layers; the
layers are named as the input layer, hidden layer, the output layer.
2. Random Forest
Random forests are a mixture of tree predictors where each tree depends on the
values of an arbitrary vector sampled individually and with the same allocation for
all trees in the forest. The generalization error for forests converges a.s. to a limit as
the amount of trees in the forest becomes great. The generalization error of a forest
of tree classifiers hangs on the strength of the individual trees in the forest and the
relationship between them (Breiman, 2001).
3. Decision Trees
Decision Tree Classification produces the output as a binary tree like construction
called a decision tree. A Decision Tree model includes rules to predict the target
variable. This algorithm scales well, even where there are changing numbers of
training examples and significant numbers of attributes in big databases.
a) J48
J48 algorithm is an implementation of the C4.5 decision tree algorithm. J48 uses the
greedy technique to induce decision trees for classification (Chen, Zheng, Lloyd,
Jordan, &amp; Brewer,
2004). A decision­tree model is built by examining training data and the model is used
to classify hidden data
b) Reduced­Error Pruning (REPTree)
REPTree is a quick decision tree learner. Constructs a decision/regression tree utilizing
data gain/variance and prunes it adopting reduced­error pruning (with backfitting).
REPTree only sorts values for numeric features once. Missing values are dealt with by
splitting the related instances into pieces (i.e. as in C4.5).
c) Decision Stump
Decision stump is an algorithm for building and using a decision stump. It is typically
used in combination with a boosting algorithm. Decision stump algorithm
does regression (mean­squared error) or classification (entropy). Missing is handled as
a separate value (“DecisionStump”, 2016).
252 ICESoS 2016 - Proceedings Book

�Regional Economic Development: Entrepreneurship and Innovation
d) Hoeffding Tree
A Hoeffding tree (VFDT) is an incremental, anytime decision tree induction algorithm
that can learn from great data streams, supposing that the distribution generating
examples does not vary over time. Hoeffding trees uses the fact that a small sample
can often be adequate to choose a best splitting attribute. This idea is supported
by the Hoeffding bound, which quantifies the number of observations (Hulten, Geoff,
Laurie, &amp; Pedro, 2001).
4. Rotation Forest
Rotation Forest is an ensemble technique which trains L decision trees separately,
using a different set of obtained features for each tree. Rotation Forest (Rodriguez,
Kuncheva, &amp; Alonso, 2006) draws upon the Random Forest idea. The base classifiers
are also separately built decision trees, but in Rotation Forest every tree is trained
on the whole data set in a rotated feature space. While the tree learning algorithm
constructs the classification regions using hyperplanes parallel to the feature axes, a
small rotation of the axes may guide to a very different tree.
● Feature Ranking
Feature ranking was applied through WEKA software using Correlation Attribute
Evaluation(“CorrelationAttributeEval,” 2016). It evaluates the value of an attribute by
measuring the correlation (Pearson’s) between it and the class. Nominal attributes
are measured on a value by value basis by regarding each value as an indicator.
A general correlation for a nominal attribute is reached at via a weighted average.
We selected all attributes whose weight is above 0.1. Those are:
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏

HTTPS
URL of Anchor
Adding Prefix or Suffix Separated by (­) to the Domain
DNS Record
Sub Domain and Multi Sub Domains
Request URL
Domain Registration Length
Server Form Handler (SFH)
Links in &lt;Meta&gt;, &lt;Script&gt; and &lt;Link&gt; tags
Google Index
Age of Domain
PageRank

ICESoS 2016 - Proceedings Book 253

�International Conference on Economic and Social Studies (ICESoS’16)
Experiments and Results
All experiments were conducted in WEKA tool (“Weka 3 ­Data Mining with Open
Source Machine Learning Software in Java,” 2016) which is an open source data
mining application created in JAVA at Waikato University.
Table 2: Full training set results
Classifier

Test 1

Test 2

MLP

85.5%

85%

Random Forest

85.7%

84.5%

C4.5

74.6%

73%

REPTree

88.4%

88%

Decision Stump

86.1%

87%

Hoeffding Tree

87.3%

88.4%

Rotation Forest (REP Tree)

89.1%

88.5%

Rotation Forest (Hoeffding Tree)

88%

84.6%

The results show that Rotation Forest algorithm with REP Tree as a classifier give
the best results for both test sets with 89.1% and 88.5% accuracy respectively. Other
classifiers were not far behind, except C4.5 with 74.6% and 73% for two test sets.
After doing the ranking features with Correlation Attribute Evaluation, we applied the
same classifiers. The results are very close to the ones with full training set. Surprisingly,
MLP results improved for both test sets to 89% and 86.4%. MLP is also the best
classifier for first test set with just 0.1% drop in comparison to Rotation Forest with REP
Tree results with the full training set. REP Tree was the best classifier for test set 2 with
87.6% correct classification.
The drop in correct classification after feature reduction is applied is 1.17%.

254 ICESoS 2016 - Proceedings Book

�Regional Economic Development: Entrepreneurship and Innovation
Table 3: Reduced training set results
Classifier

Test 1

Test 2

MLP

89%

86.4%

Random Forest

81.8%

80.2%

C4.5

73.9%

73%

REPTree

87.1%

87.6%

Decision Stump

86.1%

87%

Hoeffding Tree

82.1%

83.4%

Rotation Forest (REP Tree)

88.9%

87%

Rotation Forest (Hoeffding Tree)

87.5%

84%

If we compare two result tables, we can see that Rotation Forest with REP Tree as
a classifier gives the overall best results with 88.37% correct classification, while MLP
outshines all other classifiers when feature reduction is applied.
Discussion
(Mohammad et al., 2014) conducted the similar feature selection where they
selected nine features (Request URL, Age of Domain, HTTPS and SSL, Website Traffic,
Long URL, Sub Domain and Multi Sub Domain, Adding prefix or Suffix Separated by (−)
to Domain, URL of Anchor and Using the IP Address). If we compare their selected
attributes with ours, we can see that we share 6 same features: Request URL, Age of
Domain, HTTPS and SSL, Sub Domain and Multi Sub Domain, Adding prefix or Suffix
Separated by (−) to Domain and URL of Anchor).
Moreover, all of the 30 features fall within 4 different feature groups: Address
Bar based Features, Abnormal Based Features, HTML and JavaScript based Features,
and Domain based Features. However, none of the 12 selected feature falls within
“HTML and JavaScript” based Features. This raises the question whether this group of
features is relevant in classification of phishing websites.
Conclusion
Phishing websites detection has gotten a colossal consideration by greater part of
the individuals as it serves to recognize the undesirable data and dangers. Hence,
the greater part of the analysts focuses in discovering the best classifier for recognizing
phishing websites.
This work models the phishing website prediction as a classification task and
presents the machine learning approach for predicting whether the given website
is legitimate website or phishing. Multilayer perceptron, Decision tree classifiers, and
Rotation Forest have been applied for training the prediction model. Training set
of 11055 and two test sets of 2456 and 2670 instances with 30 attributes have been
ICESoS 2016 - Proceedings Book 255

�International Conference on Economic and Social Studies (ICESoS’16)
prepared in order to facilitate training and implementation.
From the results it has been found that the Rotation Forest algorithm with REP
Tree as a classifier and MLP performs the best on a full training and on reduced set,
respectively. When training set was reduced from 30 attributes to 12, the overall results
for all classifiers dropped for 1.17%. In the meantime, MLP’s overall results increased
from 85.5% to 87.7%.
It is hoped that more interesting results will follow on further exploration of data.
References
• Liu, Jiming, and Yiming Ye. E­commerce Agents: Marketplace Solutions, Security
Issues, and Supply and Demand. Berlin: Springer, 2001. Print.
• Aburrous, M. R., Alamgir, H., Keshav, D., &amp; Fadi, T. (2009). Modelling Intelligent
Phishing Detection System for E­
banking Using Fuzzy Data Mining. In 2009
International Conference on CyberWorlds. http://doi.org/10.1109/cw.2009.43
• Basnet, R., Ram, B., Srinivas, M., &amp; Sung, A. H. (n.d.). Detection of Phishing
Attacks: A Machine Learning Approach. In Studies in Fuzziness and Soft
Computing (pp. 373–383).
• Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University
Press. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
• Chen, M., Zheng, A. X., Lloyd, J., Jordan, M. I., &amp; Brewer, E. (n.d.). Failure
diagnosis usingdecision trees. In International Conference on Autonomic
Computing, 2004. Proceedings. http://doi.org/10.1109/icac.2004.1301345
• CorrelationAttributeEval. (n.d.). Retrieved May 9, 2016, from http://weka.
sourceforge.net/doc.dev/weka/attributeSelection/CorrelationAttributeEval.
html
• DecisionStump. (n.d.). Retrieved May 9, 2016, from http://weka.sourceforge.net/
doc.dev/weka/classifiers/trees/DecisionStump.html
• Hulten, G., Geoff, H., Laurie, S., &amp; Pedro, D. (2001). Mining time­
changing
data streams. In Proceedings of the seventh ACM SIGKDD international
conference on Knowledge discovery and data mining ­KDD ’01. http://doi.
org/10.1145/502512.502529
• James, L. (2005). Phishing Exposed. Syngress.
• Liu, J., &amp; Ye, Y. (2003). E­Commerce Agents: Marketplace Solutions, Security
Issues, and Supply and Demand. Springer.
• Mohammad, R. M., Fadi, T., &amp; Lee, M. (2013). Predicting phishing websites based
on self­structuring neural network. Neural Computing &amp; Applications, 25(2), 443–
458.
• Mohammad, R. M., Lee, M., &amp; Fadi, T. (2014). Intelligent rule­
based phishing
websites classification. IET Information Security, 8(3), 153–160.
• Rodriguez, J. J., Kuncheva, L. I., &amp; Alonso, C. J. (2006). Rotation Forest: A New
Classifier
• Ensemble Method. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 28(10), 1619–1630.
• UCI Machine Learning Repository: Phishing Websites Data Set. (n.d.). Retrieved
May 9, 2016, from https://archive.ics.uci.edu/ml/datasets/Phishing+Websites
• Weka 3 ­Data Mining with Open Source Machine Learning Software in Java.
(n.d.). Retrieved: May 9, 2016, from http://www.cs.waikato.ac.nz/ml/weka/

256 ICESoS 2016 - Proceedings Book

�</text>
                  </elementText>
                </elementTextContainer>
              </element>
            </elementContainer>
          </elementSet>
        </elementSetContainer>
      </file>
    </fileContainer>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="1862">
                <text>3308</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="1863">
                <text>COMPARISON OF MACHINE LEARNING TECHNIQUES  IN PHISHING WEBSITE CLASSIFICATION</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="1864">
                <text>Hodzic, Adnan
Kevric, Jasmin
Karadag, Adem</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="1865">
                <text>Abstract: Phishing is one among the luring strategies utilized by phishing artist in  the aim of abusing the personal details of unsuspected clients. Phishing website  is a counterfeit website with similar appearance, but changed destination. The  unsuspected client post their information thinking that these websites originate from  trusted financial institutions. New antiphishing techniques rise continuously, yet  phishers come with new strategy by breaking all the antiphishing mechanisms.  Hence there is a need for productive mechanism for the prediction of phishing  website. This paper described comparison in classification of phishing websites using  different Machinelearning  algorithms. Random Forest (RF), C4.5, REP Tree, Decision  Stump, Hoeffding Tree, Rotation Forest and MLP were used to determine which  method provides the best results in phishing websites classification. All instances are  categorized as 1 for “Legitimate”, 0 for “Suspicious” and 1  for “Phishy”. Results show  that RF with REP Tree show the best performance on this dataset for classification of  phishing websites.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="1866">
                <text>2016</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="1867">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="6">
        <name>H Social Sciences (General)</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="1200" public="1" featured="0">
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="79">
            <name>Extent</name>
            <description>The size or duration of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="9298">
                <text>3464</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="9299">
                <text>COMPARISON OF MEDIEVAL AND MODERN  METAPHORICAL CONCEPTS</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="9300">
                <text>Štrmelj, Lidija</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="9301">
                <text>This article aims to study emotion metaphors found in selected Chaucer’s Canterbury Tales and  compare them with conventional modern metaphors from current dictionaries and other sources, in order  to find out whether medieval emotional metaphorical concepts have survived up to the present-day, and if  yes, what changes can be perceived in them. The study is based on the cognitive theory of metaphor, as  developed by Lakoff and Johnson in “Metaphors We Live By”.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="40">
            <name>Date</name>
            <description>A point or period of time associated with an event in the lifecycle of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="9302">
                <text>2014</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="9303">
                <text>Conference or Workshop Item
PeerReviewed</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="18">
        <name>PE English</name>
      </tag>
    </tagContainer>
  </item>
  <item itemId="3608" public="1" featured="0">
    <fileContainer>
      <file fileId="4450">
        <src>https://omeka.ibu.edu.ba/files/original/8d66acc444227b11b0604c66cff36946.pdf</src>
        <authentication>6841d67bcbb7734abb69327017b5db0a</authentication>
      </file>
    </fileContainer>
    <collection collectionId="7">
      <elementSetContainer>
        <elementSet elementSetId="1">
          <name>Dublin Core</name>
          <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
          <elementContainer>
            <element elementId="50">
              <name>Title</name>
              <description>A name given to the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26932">
                  <text>IT Master's Theses</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="41">
              <name>Description</name>
              <description>An account of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26933">
                  <text>IT Master's Thesis collection features master's theses authored by graduate students in the Department of Information Technology. Each thesis reflects a significant research effort, combining theoretical knowledge with practical application to address complex challenges in the IT domain. These works demonstrate students’ advanced understanding of information systems, software engineering, data science, cybersecurity, and emerging technologies. The theses serve as a testament to the students' capability to conduct independent research, propose innovative solutions, and contribute to the advancement of the IT field.</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="49">
              <name>Subject</name>
              <description>The topic of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26934">
                  <text>IT Master's Thesis</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="44">
              <name>Language</name>
              <description>A language of the resource</description>
              <elementTextContainer>
                <elementText elementTextId="26935">
                  <text>English Language</text>
                </elementText>
              </elementTextContainer>
            </element>
            <element elementId="96">
              <name>Author</name>
              <description>Author</description>
              <elementTextContainer>
                <elementText elementTextId="26940">
                  <text>IT Department Master’s Students</text>
                </elementText>
              </elementTextContainer>
            </element>
          </elementContainer>
        </elementSet>
      </elementSetContainer>
    </collection>
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="27117">
                <text>Comparison of Performanse and Security Aspects of Database Access via Stored Procedures and APIs</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="96">
            <name>Author</name>
            <description>Author</description>
            <elementTextContainer>
              <elementText elementTextId="27118">
                <text>Ramiz Šuvalija</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="94">
            <name>Abstract</name>
            <description>A summary of the resource.</description>
            <elementTextContainer>
              <elementText elementTextId="27119">
                <text>Modern applications typically get the information in one of two modalities, namely API as an intermediary layer or stored procedures in the same database. The aim of this study is to contrast these methods, mainly performance-wise, and then securitywise, as well as suitability for maintenance as well as scalability. The project will implement identical stored procedures in the PostgreSQL database, and a API backend in Python. Execution time for a query, resource consumption as well as susceptibility to security flaws will be evaluated. The plan is to perform 10 runs for each comparison so as to ensure the obtained results are as accurate as well as dependable as possible. And one of the aims is to devise practical recommendations as to when to apply a stored procedure, and when the API method, where a boundary (equilibrium) has to be drawn between the logic of intermixing in the same data as well as the logic in the app layer.&#13;
Today with applications being used in distributed environments on a widespread basis, awareness of them is most important in ensuring smooth and effective development of information systems, particularly in those fields where a lot of information has to be processed, such as e-business, banks etc.</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="97">
            <name>Keywords</name>
            <description>Keywords.</description>
            <elementTextContainer>
              <elementText elementTextId="27120">
                <text>Stored Procedures, Api, Comparison, Database Access, Postgresql</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
    <tagContainer>
      <tag tagId="196">
        <name>Api</name>
      </tag>
      <tag tagId="197">
        <name>Comparison</name>
      </tag>
      <tag tagId="198">
        <name>Database Access</name>
      </tag>
      <tag tagId="200">
        <name>master’s thesis</name>
      </tag>
      <tag tagId="199">
        <name>Postgresql</name>
      </tag>
      <tag tagId="195">
        <name>Stored Procedures</name>
      </tag>
    </tagContainer>
  </item>
</itemContainer>
