Recent Topics

[Today at 07:21:41 AM]

[Today at 04:35:25 AM]

[May 24, 2019, 07:28:30 PM]

[May 24, 2019, 09:07:15 AM]

[May 24, 2019, 08:06:18 AM]

[May 23, 2019, 01:46:12 PM]

[May 23, 2019, 06:04:43 AM]

[May 23, 2019, 01:20:33 AM]

[May 22, 2019, 05:17:19 PM]

[May 22, 2019, 11:43:06 AM]

[May 22, 2019, 11:22:45 AM]

[May 22, 2019, 02:44:35 AM]

[May 22, 2019, 01:05:34 AM]

[May 21, 2019, 04:58:49 AM]

[May 20, 2019, 05:47:16 PM]

[May 20, 2019, 03:37:42 PM]

[May 19, 2019, 05:50:42 AM]

[May 18, 2019, 11:53:48 PM]

[May 18, 2019, 01:47:14 PM]

[May 18, 2019, 01:45:55 PM]

[May 18, 2019, 01:44:42 PM]

[May 18, 2019, 01:43:15 PM]

[May 18, 2019, 09:09:22 AM]

[May 18, 2019, 06:11:47 AM]

[May 18, 2019, 04:55:52 AM]

[May 18, 2019, 04:30:31 AM]

[May 18, 2019, 03:59:38 AM]

[May 17, 2019, 01:29:38 AM]

[May 14, 2019, 05:36:15 AM]

[May 13, 2019, 07:06:45 PM]

[May 12, 2019, 02:29:06 PM]

[May 05, 2019, 12:40:38 AM]

[May 02, 2019, 03:16:56 PM]

[May 01, 2019, 04:34:46 PM]

Talkbox

2019 May 24 14:13:29
Johann: Nyom Moritz

2019 May 24 13:28:52
Moritz: Vandami Bhante _/\_

2019 May 24 05:23:33
Johann: Venerable

2019 May 24 05:22:57
Khemakumara:  _/\_ _/\_ _/\_ Bhante Johann

2019 May 24 02:08:29
Johann: Nyom Moritz, Nyom Villa.

2019 May 24 01:55:56
Cheav Villa:  _/\_ _/\_ _/\_

2019 May 24 01:54:14
Moritz: Bong Villa _/\_

2019 May 24 01:49:43
Moritz: Vandami Bhante _/\_

2019 May 24 01:06:04
Johann: Venerable Ariyadhammika  _/\_

2019 May 20 04:14:26
Cheav Villa:  _/\_ _/\_ _/\_

2019 May 20 01:31:27
Johann:  _/\_ Bhante Indannano

2019 May 19 11:28:39
Khemakumara: Nyom Cheav Villa

2019 May 19 11:27:48
Khemakumara:  _/\_ _/\_ _/\_ Bhante Johann  _/\_ _/\_ _/\_

2019 May 18 23:55:08
Moritz: Vandami Bhante _/\_

2019 May 18 10:34:49
amanaki: Thank you Johann  _/\_

2019 May 18 09:59:33
Johann: Nyom Amanaki. Mudita that you may have possible found what searched for on a special day.

2019 May 18 09:24:56
Maria:  _/\_

2019 May 18 09:24:35
Maria: werter Bhante!

2019 May 18 09:22:43
Johann: Nyom Mizi

2019 May 18 09:21:31
Johann: Nyom Sophorn, Nyom Villa... may all here but also there rejoice in own and others goodness.

2019 May 18 05:03:47
Cheav Villa: សាធុ​សាធុ _/\_ _/\_ _/\_

2019 May 18 02:16:49
Moritz: _/\_ _/\_ _/\_

2019 May 14 07:51:30
Vithou:  _/\_

2019 May 14 05:40:54
Johann: As long as not using telefon while riding. Sokh chomreoun, Nyom.

2019 May 13 18:38:46
Moritz: Vandami Bhante _/\_ (sitting in Taxi)

2019 May 12 15:44:32
Johann: But better ask Nyom Chanroth, since Atma does not walk that far these days.

2019 May 12 15:04:01
Johann: not teally, Nyom Vithou. Still less water in the streams here. Some still dry. Needs a while down from the mountains and not that much rain yet.

2019 May 12 14:54:37
Vithou: how is the road Bhante? Is it float at the mountain leg?

2019 May 12 14:51:59
Vithou:   _/\_

2019 May 12 14:40:43
Johann: Nyom Vithou. Nothing special. Yes, rain is present every afternoon since some days.

2019 May 12 14:38:33
Vithou: Bhante, how is everything at Asrum? Is it raining everyday?

2019 May 12 07:05:30
Cheav Villa:  _/\_ _/\_ _/\_

2019 May 12 03:58:19
Johann: a joyful day in merits on this Sila-day

2019 May 11 17:04:10
Cheav Villa:  :) _/\_

2019 May 11 16:16:56
Moritz: Bong Villa _/\_

2019 May 11 05:35:39
Cheav Villa: Sadhu Sadhu Sadhu  _/\_ _/\_ _/\_

2019 May 11 00:52:44
Johann: an meritful Uposatha, those keeping it today

2019 May 10 17:14:43
Moritz: Chom reap leah, I am going to work. _/\_

2019 May 10 17:09:07
Johann: Nyom Moritz

2019 May 10 17:07:14
Moritz: Vandami Bhante _/\_

2019 May 10 16:19:14
Moritz: Chom reap sour, bong Villa _/\_

2019 May 07 19:12:10
Johann: Nyom Vithou. Just some hours ago, thought of him.

2019 May 05 04:26:53
Chanroth:  _/\_ _/\_ _/\_

2019 May 04 11:41:08
Cheav Villa:  _/\_ _/\_ _/\_

2019 May 04 10:27:38
Khemakumara: Nyom Cheav Villa

2019 May 03 10:08:09
Khemakumara: Sadhu, sadhu, sadhu  _/\_ _/\_ _/\_

2019 May 03 01:17:53
Johann: A meritful new moon Uposatha those celebrating it today.

2019 May 03 01:16:05
Johann: Talk box is buggy and lines love to jump. Better not editing.

2019 May 03 01:14:19
Johann: U Chanroth: "ថ្ងៃនេះខ្ញុំបាទ បានទទួលនៅសម្ភារៈមួយចំនួន សម្រាប់កសាងអាស្រមថ្មទូកសូមជូនបុណ្យដល់ពុទ្ធបរិសទ័ទាំងអស់គ

2019 May 02 15:15:58
Cheav Villa:   <.I.> _/\_

2019 May 02 15:15:17
Cheav Villa: Sorry because of kh font doesn't run well on my phone. Kana go to edit  to see the right  shout but  was wrong by deleting Pou  Chanroth 's  shout

2019 May 02 15:01:04
Cheav Villa: Mudita  :) _/\_

2019 May 02 13:47:17
Moritz: Anumodana puñña kusala! _/\_

2019 May 01 14:49:38
Johann: Now some monks are so close to many, that they can be visited even by feet.

2019 May 01 06:27:25
Johann: Thats accoss the whole city and hot (but cloudy  :) ) Best wishes and greatings.

2019 May 01 06:22:36
Cheav Villa: Get lost in the jungle of Phnom Penh from 2pm till 6pm

2019 May 01 05:40:26
Cheav Villa: Bhante Khemakumara arrived at Wat Sophea Khun in late evening  _/\_ _/\_ _/\_

2019 Apr 29 09:19:56
Johann: Meister Moritz

2019 Apr 29 08:51:27
Moritz: Vandami Bhante _/\_

2019 Apr 29 03:07:40
Moritz: Chom reap leah _/\_ I am going to sleep.

2019 Apr 29 02:59:19
Cheav Villa:  _/\_

2019 Apr 29 02:41:01
Moritz: _/\_ Bong Villa

2019 Apr 29 01:02:45
Johann: let's see wheter the rain has swept away the new floor ...

2019 Apr 28 13:58:19
Cheav Villa: First time of Heavy rain in Phnom Penh  _/\_ _/\_ _/\_

2019 Apr 28 10:33:14
Cheav Villa: សាធុ​សាធុ  _/\_ _/\_ _/\_

2019 Apr 28 10:21:50
Johann: Oh, rain.

2019 Apr 28 08:46:02
Johann: Pleasing soft weather, yes  :)

2019 Apr 28 08:45:28
Johann: Atma rejoices with the may who possible are able to see a death man walking toward the deathless, and have even occation for a lot of spontan merits.

2019 Apr 28 07:31:59
Cheav Villa: May​ Bhante be well  _/\_ _/\_ _/\_  The sky has been cloudy since 10:30am

2019 Apr 28 03:02:15
Ieng Puthy: 🙏🏻🙏🏻🙏🏻 May Bhante Khemakumara walk(nimun) safely. _/\_ _/\_ _/\_

2019 Apr 27 20:51:41
Johann:  :)

2019 Apr 27 20:51:11
Johann: may it be cloudy (in the hot time) so that feet may be well

2019 Apr 27 17:53:19
Moritz: May Bhante travel safely. _/\_ _/\_ _/\_

2019 Apr 27 16:38:40
Cheav Villa: May​ the Mighty Devas protected Bhante Khemakumara along the path to Wat Sophea Khun  _/\_ _/\_ _/\_

2019 Apr 27 16:03:29
Cheav Villa: ខ្ញុំ​កូណាបាន លឺថា ព្រះអង្គ​ Kemakumara នឹង​និមន្ត ចេញពី វត្ត​អកយំ​ នៅថ្ងៃស្អែក

2019 Apr 27 16:01:32
Cheav Villa: ថ្វាយបង្គំ​ព្រះអង្គ  _/\_ _/\_ _/\_

2019 Apr 27 14:28:42
Ieng Puthy: 🙏🏻🙏🏻🙏🏻

2019 Apr 27 06:55:13
Johann: Nyom Villa

2019 Apr 27 06:54:35
Cheav Villa:   _/\_ _/\_ _/\_

2019 Apr 27 06:31:42
Johann: Nyom Moritz

2019 Apr 27 06:09:34
Moritz: Vandami Bhante _/\_

2019 Apr 27 05:42:54
Moritz: _/\_

2019 Apr 27 00:54:04
Johann: A blessed and meritful halfmoon Sila-day

2019 Apr 25 07:32:44
Ieng Puthy: 🙏🏻🙏🏻🙏🏻អរព្រះគុណ ព្រះអង្គ

2019 Apr 25 04:42:51
Johann: Sokh chomreoun, Nyom. (May well-being come to fullfillment.)

2019 Apr 25 02:30:46
Ieng Puthy: តេីលោកRoman មានបំណងទៅវត្តអកយំនៅថ្ងៃណាដែរ?ព្រះអង្គ🙏🏻🙏🏻🙏🏻

2019 Apr 25 02:29:26
Ieng Puthy: ករុណានិង បងសុភឿន នឹងជូនលោកRoman ទៅវត្តអកយំបាន

2019 Apr 25 02:28:00
Ieng Puthy: ករុណានិង បង សុភឿន នឹងជួយស

2019 Apr 25 02:27:00
Ieng Puthy: 🙏🏻🙏🏻🙏🏻ករុណាថ្វាយបង្គំុព្រះអង្គ Vandami Bhante

2019 Apr 24 17:56:05
Cheav Villa: កូណា សរសេរពួកយើង​ គឺជំនួសមុខ​ បងពុទ្ធីនិងសុភឿន  _/\_

2019 Apr 24 17:54:42
Cheav Villa: បង​ពុទ្ធី បានអោយកូណាសួរអំពីពេលវេលា​ ដែលលោកRoman នឹងទៅអកយំ _/\_

2019 Apr 24 17:52:47
Cheav Villa:  _/\_ _/\_ _/\_  កូណាបាន ប្រាប់បងពុទ្ធី និង​សុ​ភឿន ប្រសិនបើគាត់អាចជួយបាន ព្រោះកូណាមិនមានសេរីភាពច្រើនដូចពួកគាត់

2019 Apr 24 17:01:34
Johann: Modern (ab)art of conversation and old patient culture...  :) great training only serious take on and rushing hide on messanger, fb, or in the ocean of Maras internet. Mudita.  :)

2019 Apr 23 13:36:18
Cheav Villa: Kana :D _/\_

2019 Apr 23 13:24:57
Johann: ? But light is always good. Oh, maybe the honey bee candles...: Atma told Upasika Sophorn to take them with her to share, since the mices would eat them away here. Mudita

2019 Apr 23 12:52:51
Cheav Villa: Kana Preah Ang  _/\_ Vithou told kana that Bhante sending us a pair of candles all through Bang Sophorn  :D _/\_

2019 Apr 23 12:06:15
Johann: Nyom Villa. Atma does not understand all circumstances but much mudita and appreciantion with sharing merits with each other, taking each other along good.

2019 Apr 23 11:04:23
Cheav Villa: កូណា ទើបបានដំណឹងពី Vithou ថាព្រះអង្គផ្ញើទានមួយគូមក តាមរយៈ​បងសុភ័ណ​ ខ្ញុំកូណា​សូម​អរព្រះគុណ​  :) _/\_

2019 Apr 23 11:02:19
Cheav Villa: ថ្វាយបង្គំ​ព្រះអង្គ  _/\_ _/\_ _/\_

Tipitaka Khmer

 Please feel welcome to join the transcription project of the Tipitaka translation in khmer, and share one of your favorite Sutta or more. Simply click here or visit the Forum: 

Search ATI on ZzE

Zugang zur Einsicht - Schriften aus der Theravada Tradition



Access to Insight / Zugang zur Einsicht: Dhamma-Suche auf mehr als 4000 Webseiten (deutsch / english) - ohne zu googeln, andere Ressourcen zu nehmen, weltliche Verpflichtungen einzugehen. Sie sind für den Zugang zur Einsicht herzlich eingeladen diese Möglichkeit zu nutzen. (Info)

Random Sutta
Random Article
Random Jataka

Zufälliges Sutta
Zufälliger Artikel
Zufälliges Jataka


Arbeits/Work Forum ZzE

"Dhammatalks.org":
[logo dhammatalks.org]
Random Talk
[pic 30]

Zugang zur Einsicht - Übersetzung, Kritik und Anmerkungen

Herzlich Willkommen im Arbeitsforum von zugangzureinsicht.org im Onlinekloster sangham.net!


Danke werte(r) Besucher(in), dass Sie von dieser Möglichkeit Gebrauch machen und sich direkt einbringen wollen.

Unten (wenn Sie etwas scrollen) finden Sie eine Eingabemaske, in der Sie Ihre Eingabe einbringen können. Es stehen Ihnen auch verschiedene Gestaltungsmöglichkeiten zur Verfügung. Wenn Sie einen Text im formatierten Format abspeichern wollen, klicken Sie bitte das kleine Kästchen mit dem Pfeil.

Die Textfelder "Name" und "email" müssen ausgefüllt werden, Sie können hier aber auch eine Anonyme Angabe machen und eine Pseudo-email angeben (geben Sie, wenn Sie Rückantwort haben wollen, jedoch einen Kontakt an), wenn Ihnen das unangenehm ist. Der Name scheint im Forum als Text auf und die Email ist von niemanden außer dem Administrator einsehbar.

Wenn Sie den Text fertig geschrieben haben, müssen Sie noch den Spamschutz überwinden, das Bild zusammen setzen, und dann auf "Vorschau" oder "Senden" drücken, wenn für Sie alles passt.

Wenn Sie eine Spende einer Übersetzung machen wollen, wäre es schön, wenn Sie etwas vom Entstehen bzw. deren Herkunft erzählen und Ihrer Gabe vielleicht noch eine Widmung anhängen.

Gerne, so es möglich ist, werden wir Ihre Übersetzung dann auch den Seiten von Zugang zur Einsicht veröffentlichen. Für generelle Fragen zu dem Umfang der Dhamma-Geschenke auf ZzE sehen Sie bitte in den FAQ von ZzE ein.

Gerne empfangen wir Kritik und selbstverständlich auch Korrekturen oder Anregungen hier. Es steht Ihnen natürlich offen und Sie sind dazu herzlich eingeladen auch direkt mit einem eigenen Zugang hier an den Arbeiten vielleicht direkt teilzunehmen.

Sadhu!

metta & mudita
Ihr Zugang zur Einsicht Team

Um sich im Abeitsforum etwas unzusehen, klicken Sie hier. . Sie finden hier viele Informationen und vielleicht sogar neues rund um Zugang zur Einsicht.

Author Topic: [ATI.eu] Indexing and search engine issues  (Read 3764 times)

0 Members and 1 Guest are viewing this topic.

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #30 on: March 28, 2019, 08:32:56 AM »
Currently not using search or batchedit, how ever Nyom might think.

(There is a inbuilt search.php, told that it can be executed direct on the server to rebuild the index. Maybe that helps. https://www.dokuwiki.org/cli#indexerphp )
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +248/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #31 on: March 28, 2019, 08:48:11 AM »
Currently not using search or batchedit, how ever Nyom might think.

(There is a inbuilt search.php, told that it can be executed direct on the server to rebuild the index. Maybe that helps. https://www.dokuwiki.org/cli#indexerphp )

Rebuilding index started.

The helper scripts listed on https://www.dokuwiki.org/cli are only usable if one has shell access on the server. But that is not the case for the Greensta server here. (But still possibly useful to look into and adapt something maybe when having more time for it.) So just using the previous approach now.

_/\_

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #32 on: March 28, 2019, 08:51:04 AM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +248/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #33 on: March 29, 2019, 08:25:48 AM »
I accidentally restarted rebuilding the index again from scratch. So now, progress is again at about 5000/20000 pages.

I wrote a new script, adapting methods from the CLI script , so that the whole process would run on the server, not needing to have a connection and open browser window all the time to send commands for every single page to be indexed one by one.
This should at least be a little bit faster, without the sending commands and responses back and forth, but the speed difference is not really noticeable. So it should, again, be finished in one day.

The current progress can be seen by opening http://accesstoinsight.eu/indexer.success.log (listing pages that were indexed successfully) and http://accesstoinsight.eu/indexer.error.log (listing pages which could not be indexed for some reason, currently empty).
There is a counting number before each page name in the lists, so one can see how many pages have already been processed.

_/\_

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #34 on: March 29, 2019, 09:44:28 AM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +248/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #35 on: March 30, 2019, 08:48:15 AM »
The indexing script I had started on the server (which should be doing just the same as the CLI indexer script) stopped at some point due to running out of memory (working memory, not storage memory). It seems that certain pages simply cannot be indexed because the indexer would need too much memory for it.
For example http://accesstoinsight.eu/cs-th:tika:sut.dn.0_tik and following pages always fail with
Code: [Select]
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 67108872 bytes) in /var/www/clients/client2157/web5417/web/inc/indexer.php on line 612
or similar.

Line 612 is here:
Code: [Select]
$wordlist = explode(' ', $text);
splitting the whole text of a page into single words by spaces.

But I really do not understand why this would take so much memory. Also, replicating this same operation on my computer, splitting the same page text with the same methods into single words and storing in a variable in PHP, does not need nearly as much memory here.

Trying to find a way to work around it, I gave up now.

Continued indexing with the other method (which runs locally on my computer and sends a command for every single page to be indexed through the network, and does not stop if a page fails to be indexed), currently indexed until ~11000 pages (with many "holes" of pages which just cannot be indexed with the current server).

Should be finished in some 16 hours maybe if now just let to run. But with the current server infrastructure it seems the search index will always be incomplete.

_/\_

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #36 on: March 30, 2019, 10:07:07 AM »
Sadhu for effort and care. May Nyom always give/take himself his time.

(The "big pages", Atma thinks about 10 %, like the other of the cscd Tipitaka, would not change later on in regard of content. Atma remembers that once there was still a search engine on ZzE, it was also never possible to index all Pali Tipitaka pages of original Ati as well, always having errors.

On the other side, on ZzE once and also now on ati.eu, there have been times where the index was obviously complete.)

« Last Edit: March 30, 2019, 10:29:30 AM by Johann »
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +248/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #37 on: March 31, 2019, 07:23:59 PM »
Quote
May Nyom always give/take himself his time.
_/\_

Indexing finished some time this morning.

Quote
On the other side, on ZzE once and also now on ati.eu, there have been times where the index was obviously complete.)
Obviously (offensichtlich)? Or apparently (offenbar, scheinbar, anscheinend)?

I think maybe the latter, because these errors would never appear in the Searchindex Manager plugin. It would just say "page already up to date" or something, when a page could not be indexed.

After retrying several times to index the files which failed, all files which still could not be indexed are just 474 pages in Thai script (listed below). I think the reason is the way DokuWiki handles some Asian scripts, including Thai, treating every character as a single word, which would take a lot of memory for the indexer. Quote from inc/indexer.php file, line 18 and following:
Code: [Select]
// Asian characters are handled as words. The following regexp defines the
// Unicode-Ranges for Asian characters
// Ranges taken from http://en.wikipedia.org/wiki/Unicode_block
// I'm no language expert. If you think some ranges are wrongly chosen or
// a range is missing, please contact me
define('IDX_ASIAN1','[\x{0E00}-\x{0E7F}]'); // Thai

I have deleted all files in en:s and de:s which were just examples on how to integrate Google Site Search and comments about other search engines tested by Mr. Bullitt for accesstoinsight.org in the past.

List of unindexed Thai script files:
Code: [Select]
cs-th:atthakatha:sut.kn.jat.v01_att
cs-th:atthakatha:sut.kn.jat.v02_att
cs-th:atthakatha:sut.kn.jat.v03_att
cs-th:atthakatha:sut.kn.jat.v04_att
cs-th:atthakatha:sut.kn.jat.v05_att
cs-th:atthakatha:sut.kn.jat.v06_att
cs-th:atthakatha:sut.kn.jat.v07_att
cs-th:atthakatha:sut.kn.jat.v08_att
cs-th:atthakatha:sut.kn.jat.v09_att
cs-th:atthakatha:sut.kn.jat.v10_att
cs-th:atthakatha:sut.kn.jat.v11_att
cs-th:atthakatha:sut.kn.jat.v12_att
cs-th:atthakatha:sut.kn.jat.v13_att
cs-th:atthakatha:sut.kn.jat.v14_att
cs-th:atthakatha:sut.kn.jat.v15_att
cs-th:atthakatha:sut.kn.jat.v16_att
cs-th:atthakatha:sut.kn.jat.v17_att
cs-th:atthakatha:sut.kn.jat.v18_att
cs-th:atthakatha:sut.kn.jat.v19_att
cs-th:atthakatha:sut.kn.jat.v20_att
cs-th:atthakatha:sut.kn.jat.v21_att
cs-th:atthakatha:sut.kn.jat.v22_att
cs-th:atthakatha:sut.kn.jat.v23_att
cs-th:atthakatha:sut.kn.khp.0_att
cs-th:atthakatha:sut.kn.khp.1_att
cs-th:atthakatha:sut.kn.khp.2_att
cs-th:atthakatha:sut.kn.khp.3_att
cs-th:atthakatha:sut.kn.khp.4_att
cs-th:atthakatha:sut.kn.khp.5_att
cs-th:atthakatha:sut.kn.khp.6_att
cs-th:atthakatha:sut.kn.khp.7_att
cs-th:atthakatha:sut.kn.khp.8_att
cs-th:atthakatha:sut.kn.khp.9_att
cs-th:atthakatha:sut.kn.man.00_att
cs-th:atthakatha:sut.kn.man.01_att
cs-th:atthakatha:sut.kn.man.02_att
cs-th:atthakatha:sut.kn.man.03_att
cs-th:atthakatha:sut.kn.man.04_att
cs-th:atthakatha:sut.kn.man.05_att
cs-th:atthakatha:sut.kn.man.06_att
cs-th:atthakatha:sut.kn.man.07_att
cs-th:atthakatha:sut.kn.man.08_att
cs-th:atthakatha:sut.kn.man.09_att
cs-th:atthakatha:sut.kn.man.10_att
cs-th:atthakatha:sut.kn.man.11_att
cs-th:atthakatha:sut.kn.man.12_att
cs-th:atthakatha:sut.kn.man.13_att
cs-th:atthakatha:sut.kn.man.14_att
cs-th:atthakatha:sut.kn.man.15_att
cs-th:atthakatha:sut.kn.man.16_att
cs-th:atthakatha:sut.kn.net.0_att
cs-th:atthakatha:sut.kn.net.1_att
cs-th:atthakatha:sut.kn.net.2_att
cs-th:atthakatha:sut.kn.net.3_att
cs-th:atthakatha:sut.kn.net.4_att
cs-th:atthakatha:sut.kn.net.5_att
cs-th:atthakatha:sut.kn.net.6_att
cs-th:atthakatha:sut.kn.pat.v0_att
cs-th:atthakatha:sut.kn.pat.v1.01_att
cs-th:atthakatha:sut.kn.pat.v1.02_att
cs-th:atthakatha:sut.kn.pat.v1.03_att
cs-th:atthakatha:sut.kn.pat.v1.04_att
cs-th:atthakatha:sut.kn.pat.v1.05_att
cs-th:atthakatha:sut.kn.pat.v1.06_att
cs-th:atthakatha:sut.kn.pat.v1.07_att
cs-th:atthakatha:sut.kn.pat.v1.08_att
cs-th:atthakatha:sut.kn.pat.v1.09_att
cs-th:atthakatha:sut.kn.pat.v1.10_att
cs-th:atthakatha:sut.kn.pat.v1_att
cs-th:atthakatha:sut.kn.pat.v2_att
cs-th:atthakatha:sut.kn.pat.v3.01_att
cs-th:atthakatha:sut.kn.pat.v3.02_att
cs-th:atthakatha:sut.kn.pat.v3.03_att
cs-th:atthakatha:sut.kn.pat.v3.04_att
cs-th:atthakatha:sut.kn.pat.v3.05_att
cs-th:atthakatha:sut.kn.pat.v3.06_att
cs-th:atthakatha:sut.kn.pat.v3.07_att
cs-th:atthakatha:sut.kn.pat.v3.08_att
cs-th:atthakatha:sut.kn.pat.v3.09_att
cs-th:atthakatha:sut.kn.pat.v3.10_att
cs-th:atthakatha:sut.kn.pat.v3_att
cs-th:atthakatha:sut.kn.pev.0_att
cs-th:atthakatha:sut.kn.pev.1_att
cs-th:atthakatha:sut.kn.pev.2_att
cs-th:atthakatha:sut.kn.pev.3_att
cs-th:atthakatha:sut.kn.pev.4_att
cs-th:atthakatha:sut.kn.snp.1_att
cs-th:atthakatha:sut.kn.snp.2_att
cs-th:atthakatha:sut.kn.snp.3_att
cs-th:atthakatha:sut.kn.snp.4_att
cs-th:atthakatha:sut.kn.snp.5_att
cs-th:atthakatha:sut.kn.tha.00_att
cs-th:atthakatha:sut.kn.tha.01_att
cs-th:atthakatha:sut.kn.tha.02_att
cs-th:atthakatha:sut.kn.tha.03_att
cs-th:atthakatha:sut.kn.tha.04_att
cs-th:atthakatha:sut.kn.tha.05_att
cs-th:atthakatha:sut.kn.tha.06_att
cs-th:atthakatha:sut.kn.tha.07_att
cs-th:atthakatha:sut.kn.tha.08_att
cs-th:atthakatha:sut.kn.tha.09_att
cs-th:atthakatha:sut.kn.tha.10_att
cs-th:atthakatha:sut.kn.tha.11_att
cs-th:atthakatha:sut.kn.tha.12_att
cs-th:atthakatha:sut.kn.tha.13_att
cs-th:atthakatha:sut.kn.tha.14_att
cs-th:atthakatha:sut.kn.tha.15_att
cs-th:atthakatha:sut.kn.tha.16_att
cs-th:atthakatha:sut.kn.tha.17_att
cs-th:atthakatha:sut.kn.tha.18_att
cs-th:atthakatha:sut.kn.tha.19_att
cs-th:atthakatha:sut.kn.tha.20_att
cs-th:atthakatha:sut.kn.tha.21_att
cs-th:atthakatha:sut.kn.thi.01_att
cs-th:atthakatha:sut.kn.thi.02_att
cs-th:atthakatha:sut.kn.thi.03_att
cs-th:atthakatha:sut.kn.thi.04_att
cs-th:atthakatha:sut.kn.thi.05_att
cs-th:atthakatha:sut.kn.thi.06_att
cs-th:atthakatha:sut.kn.thi.07_att
cs-th:atthakatha:sut.kn.thi.08_att
cs-th:atthakatha:sut.kn.thi.09_att
cs-th:atthakatha:sut.kn.thi.10_att
cs-th:atthakatha:sut.kn.thi.11_att
cs-th:atthakatha:sut.kn.thi.12_att
cs-th:atthakatha:sut.kn.thi.13_att
cs-th:atthakatha:sut.kn.thi.14_att
cs-th:atthakatha:sut.kn.thi.15_att
cs-th:atthakatha:sut.kn.thi.16_att
cs-th:atthakatha:sut.kn.uda.0_att
cs-th:atthakatha:sut.kn.uda.1_att
cs-th:atthakatha:sut.kn.uda.2_att
cs-th:atthakatha:sut.kn.uda.3_att
cs-th:atthakatha:sut.kn.uda.4_att
cs-th:atthakatha:sut.kn.uda.5_att
cs-th:atthakatha:sut.kn.uda.6_att
cs-th:atthakatha:sut.kn.uda.7_att
cs-th:atthakatha:sut.kn.uda.8_att
cs-th:atthakatha:sut.kn.viv.v0_att
cs-th:atthakatha:sut.kn.viv.v1_att
cs-th:atthakatha:sut.kn.viv.v2_att
cs-th:atthakatha:sut.mn.v00_att
cs-th:atthakatha:sut.mn.v01_att
cs-th:atthakatha:sut.mn.v02_att
cs-th:atthakatha:sut.mn.v03_att
cs-th:atthakatha:sut.mn.v04_att
cs-th:atthakatha:sut.mn.v05_att
cs-th:atthakatha:sut.mn.v06_att
cs-th:atthakatha:sut.mn.v07_att
cs-th:atthakatha:sut.mn.v08_att
cs-th:atthakatha:sut.mn.v09_att
cs-th:atthakatha:sut.mn.v10_att
cs-th:atthakatha:sut.mn.v11_att
cs-th:atthakatha:sut.mn.v12_att
cs-th:atthakatha:sut.mn.v13_att
cs-th:atthakatha:sut.mn.v14_att
cs-th:atthakatha:sut.mn.v15_att
cs-th:atthakatha:sut.sn.00_att
cs-th:atthakatha:sut.sn.01_att
cs-th:atthakatha:sut.sn.02_att
cs-th:atthakatha:sut.sn.03_att
cs-th:atthakatha:sut.sn.04_att
cs-th:atthakatha:sut.sn.05_att
cs-th:atthakatha:sut.sn.06_att
cs-th:atthakatha:sut.sn.07_att
cs-th:atthakatha:sut.sn.08_att
cs-th:atthakatha:sut.sn.09_att
cs-th:atthakatha:sut.sn.10_att
cs-th:atthakatha:sut.sn.11_att
cs-th:atthakatha:sut.sn.12_att
cs-th:atthakatha:sut.sn.13_att
cs-th:atthakatha:sut.sn.14_att
cs-th:atthakatha:sut.sn.15_att
cs-th:atthakatha:sut.sn.16_att
cs-th:atthakatha:sut.sn.17_att
cs-th:atthakatha:sut.sn.18_att
cs-th:atthakatha:sut.sn.19_att
cs-th:atthakatha:sut.sn.20_att
cs-th:atthakatha:sut.sn.21_att
cs-th:atthakatha:sut.sn.22_att
cs-th:atthakatha:sut.sn.23_att
cs-th:atthakatha:sut.sn.24_att
cs-th:atthakatha:sut.sn.25_att
cs-th:atthakatha:sut.sn.26_att
cs-th:atthakatha:sut.sn.27_att
cs-th:atthakatha:sut.sn.28_att
cs-th:atthakatha:sut.sn.29_att
cs-th:atthakatha:sut.sn.30_att
cs-th:atthakatha:sut.sn.31_att
cs-th:atthakatha:sut.sn.32_att
cs-th:atthakatha:sut.sn.33_att
cs-th:atthakatha:sut.sn.34_att
cs-th:atthakatha:sut.sn.35_att
cs-th:atthakatha:sut.sn.36_att
cs-th:atthakatha:sut.sn.37_att
cs-th:atthakatha:sut.sn.38_att
cs-th:atthakatha:sut.sn.39_att
cs-th:atthakatha:sut.sn.40_att
cs-th:atthakatha:sut.sn.41_att
cs-th:atthakatha:sut.sn.42_att
cs-th:atthakatha:sut.sn.43_att
cs-th:atthakatha:sut.sn.44_att
cs-th:atthakatha:sut.sn.45_att
cs-th:atthakatha:sut.sn.46_att
cs-th:atthakatha:sut.sn.47_att
cs-th:atthakatha:sut.sn.48_att
cs-th:atthakatha:sut.sn.49_att
cs-th:atthakatha:sut.sn.50_att
cs-th:atthakatha:sut.sn.51_att
cs-th:atthakatha:sut.sn.52_att
cs-th:atthakatha:sut.sn.53_att
cs-th:atthakatha:sut.sn.54_att
cs-th:atthakatha:sut.sn.55_att
cs-th:atthakatha:sut.sn.56_att
cs-th:atthakatha:vin.cv.01_att
cs-th:atthakatha:vin.cv.02_att
cs-th:atthakatha:vin.cv.03_att
cs-th:atthakatha:vin.cv.04_att
cs-th:atthakatha:vin.cv.05_att
cs-th:atthakatha:vin.cv.06_att
cs-th:atthakatha:vin.cv.07_att
cs-th:atthakatha:vin.cv.08_att
cs-th:atthakatha:vin.cv.09_att
cs-th:atthakatha:vin.cv.10_att
cs-th:atthakatha:vin.cv.11_att
cs-th:atthakatha:vin.cv.12_att
cs-th:atthakatha:vin.mv.01_att
cs-th:atthakatha:vin.mv.02_att
cs-th:atthakatha:vin.mv.03_att
cs-th:atthakatha:vin.mv.04_att
cs-th:atthakatha:vin.mv.05_att
cs-th:atthakatha:vin.mv.06_att
cs-th:atthakatha:vin.mv.07_att
cs-th:atthakatha:vin.mv.08_att
cs-th:atthakatha:vin.mv.09_att
cs-th:atthakatha:vin.mv.10_att
cs-th:atthakatha:vin.pac.ak_att
cs-th:atthakatha:vin.pac.nii_att
cs-th:atthakatha:vin.pac.pc_att
cs-th:atthakatha:vin.pac.pci_att
cs-th:atthakatha:vin.pac.pd_att
cs-th:atthakatha:vin.pac.pdi_att
cs-th:atthakatha:vin.pac.pri_att
cs-th:atthakatha:vin.pac.sgi_att
cs-th:atthakatha:vin.pac.sk_att
cs-th:atthakatha:vin.par.ay_att
cs-th:atthakatha:vin.par.ga_att
cs-th:atthakatha:vin.par.ni_att
cs-th:atthakatha:vin.par.pr_att
cs-th:atthakatha:vin.par.sg_att
cs-th:atthakatha:vin.par.ve_att
cs-th:atthakatha:vin.pv.01_att
cs-th:atthakatha:vin.pv.02_att
cs-th:atthakatha:vin.pv.03_att
cs-th:atthakatha:vin.pv.04_att
cs-th:atthakatha:vin.pv.05_att
cs-th:atthakatha:vin.pv.06_att
cs-th:atthakatha:vin.pv.07_att
cs-th:atthakatha:vin.pv.08_att
cs-th:atthakatha:vin.pv.09_att
cs-th:atthakatha:vin.pv.10_att
cs-th:atthakatha:vin.pv.11_att
cs-th:atthakatha:vin.pv.12_att
cs-th:atthakatha:vin.pv.13_att
cs-th:atthakatha:vin.pv.14_att
cs-th:atthakatha:vin.pv.15_att
cs-th:atthakatha:vin.pv.16_att
cs-th:atthakatha:vin.pv.17_att
cs-th:atthakatha:vin.pv.18_att
cs-th:tika:abh.ava-pura.01_tik
cs-th:tika:abh.ava-pura.02_tik
cs-th:tika:abh.ava-pura.03_tik
cs-th:tika:abh.ava-pura.04_tik
cs-th:tika:abh.ava-pura.05_tik
cs-th:tika:abh.ava-pura.06_tik
cs-th:tika:abh.ava-pura.07_tik
cs-th:tika:abh.ava-pura.08_tik
cs-th:tika:abh.ava-pura.09_tik
cs-th:tika:abh.ava-pura.10_tik
cs-th:tika:abh.ava-pura.11_tik
cs-th:tika:sut.dn.01_abh_tik
cs-th:tika:sut.dn.01_tik
cs-th:tika:sut.dn.02_abh_tik
cs-th:tika:sut.dn.02_tik
cs-th:tika:sut.dn.03_abh_tik
cs-th:tika:sut.dn.03_tik
cs-th:tika:sut.dn.04_abh_tik
cs-th:tika:sut.dn.04_tik
cs-th:tika:sut.dn.05_abh_tik
cs-th:tika:sut.dn.05_tik
cs-th:tika:sut.dn.06_abh_tik
cs-th:tika:sut.dn.06_tik
cs-th:tika:sut.dn.07_abh_tik
cs-th:tika:sut.dn.07_tik
cs-th:tika:sut.dn.08_abh_tik
cs-th:tika:sut.dn.08_tik
cs-th:tika:sut.dn.09_abh_tik
cs-th:tika:sut.dn.09_tik
cs-th:tika:sut.dn.0_tik
cs-th:tika:sut.dn.10_abh_tik
cs-th:tika:sut.dn.10_tik
cs-th:tika:sut.dn.11_abh_tik
cs-th:tika:sut.dn.11_tik
cs-th:tika:sut.dn.12_abh_tik
cs-th:tika:sut.dn.12_tik
cs-th:tika:sut.dn.13_abh_tik
cs-th:tika:sut.dn.13_tik
cs-th:tika:sut.dn.14_tik
cs-th:tika:sut.dn.15_tik
cs-th:tika:sut.dn.16_tik
cs-th:tika:sut.dn.17_tik
cs-th:tika:sut.dn.18_tik
cs-th:tika:sut.dn.19_tik
cs-th:tika:sut.dn.20_tik
cs-th:tika:sut.dn.21_tik
cs-th:tika:sut.dn.22_tik
cs-th:tika:sut.dn.23_tik
cs-th:tika:sut.dn.24_tik
cs-th:tika:sut.dn.25_tik
cs-th:tika:sut.dn.26_tik
cs-th:tika:sut.dn.27_tik
cs-th:tika:sut.dn.28_tik
cs-th:tika:sut.dn.29_tik
cs-th:tika:sut.dn.30_tik
cs-th:tika:sut.dn.31_tik
cs-th:tika:sut.dn.32_tik
cs-th:tika:sut.dn.33_tik
cs-th:tika:sut.dn.34_tik
cs-th:tika:sut.kn.paka00_tik
cs-th:tika:sut.kn.paka01_tik
cs-th:tika:sut.kn.paka02_tik
cs-th:tika:sut.kn.paka03_tik
cs-th:tika:sut.kn.paka04_tik
cs-th:tika:sut.kn.paka05_tik
cs-th:tika:sut.kn.paka06_tik
cs-th:tika:sut.kn.vibh01_tik
cs-th:tika:sut.kn.vibh02_tik
cs-th:tika:sut.kn.vibh03_tik
cs-th:tika:sut.kn.vibh04_tik
cs-th:tika:sut.kn.vibh05_tik
cs-th:tika:sut.kn.vibh06_tik
cs-th:tika:sut.mn.0_tik
cs-th:tika:sut.mn.v01_tik
cs-th:tika:sut.mn.v02_tik
cs-th:tika:sut.mn.v03_tik
cs-th:tika:sut.mn.v04_tik
cs-th:tika:sut.mn.v05_tik
cs-th:tika:sut.mn.v06_tik
cs-th:tika:sut.mn.v07_tik
cs-th:tika:sut.mn.v08_tik
cs-th:tika:sut.mn.v09_tik
cs-th:tika:sut.mn.v10_tik
cs-th:tika:sut.mn.v11_tik
cs-th:tika:sut.mn.v12_tik
cs-th:tika:sut.mn.v13_tik
cs-th:tika:sut.mn.v14_tik
cs-th:tika:sut.mn.v15_tik
cs-th:tika:sut.sn.01_tik
cs-th:tika:sut.sn.02_tik
cs-th:tika:sut.sn.03_tik
cs-th:tika:sut.sn.04_tik
cs-th:tika:sut.sn.05_tik
cs-th:tika:sut.sn.06_tik
cs-th:tika:sut.sn.07_tik
cs-th:tika:sut.sn.08_tik
cs-th:tika:sut.sn.09_tik
cs-th:tika:sut.sn.0_tik
cs-th:tika:sut.sn.10_tik
cs-th:tika:sut.sn.11_tik
cs-th:tika:sut.sn.12_tik
cs-th:tika:sut.sn.13_tik
cs-th:tika:sut.sn.14_tik
cs-th:tika:sut.sn.15_tik
cs-th:tika:sut.sn.16_tik
cs-th:tika:sut.sn.17_tik
cs-th:tika:sut.sn.18_tik
cs-th:tika:sut.sn.19_tik
cs-th:tika:sut.sn.20_tik
cs-th:tika:sut.sn.21_tik
cs-th:tika:sut.sn.22_tik
cs-th:tika:sut.sn.23_tik
cs-th:tika:sut.sn.24_tik
cs-th:tika:sut.sn.25_tik
cs-th:tika:sut.sn.26_tik
cs-th:tika:sut.sn.27_tik
cs-th:tika:sut.sn.28_tik
cs-th:tika:sut.sn.29_tik
cs-th:tika:sut.sn.30_tik
cs-th:tika:sut.sn.31_tik
cs-th:tika:sut.sn.32_tik
cs-th:tika:sut.sn.33_tik
cs-th:tika:sut.sn.34_tik
cs-th:tika:sut.sn.35_tik
cs-th:tika:sut.sn.36_tik
cs-th:tika:sut.sn.37_tik
cs-th:tika:sut.sn.38_tik
cs-th:tika:sut.sn.39_tik
cs-th:tika:sut.sn.40_tik
cs-th:tika:sut.sn.41_tik
cs-th:tika:sut.sn.42_tik
cs-th:tika:sut.sn.43_tik
cs-th:tika:sut.sn.44_tik
cs-th:tika:sut.sn.45_tik
cs-th:tika:sut.sn.46_tik
cs-th:tika:sut.sn.47_tik
cs-th:tika:sut.sn.48_tik
cs-th:tika:sut.sn.49_tik
cs-th:tika:sut.sn.50_tik
cs-th:tika:sut.sn.51_tik
cs-th:tika:sut.sn.52_tik
cs-th:tika:sut.sn.53_tik
cs-th:tika:sut.sn.54_tik
cs-th:tika:sut.sn.55_tik
cs-th:tika:sut.sn.56_tik
cs-th:tika:vin.bhi.0_dvem_tik
cs-th:tika:vin.bhi.0_kank_tik
cs-th:tika:vin.bhi.0_vima_tik
cs-th:tika:vin.bhi.v_dvem_tik
cs-th:tika:vin.bhu.0_dvem_tik
cs-th:tika:vin.bhu.0_kank_tik
cs-th:tika:vin.bhu.ni_kank_tik
cs-th:tika:vin.bhu.pc_kank_tik
cs-th:tika:vin.bhu.pr_kank_tik
cs-th:tika:vin.bhu.sg_kank_tik
cs-th:tika:vin.cv.01_sara_tik
cs-th:tika:vin.cv.01_vima_tik
cs-th:tika:vin.cv.02_sara_tik
cs-th:tika:vin.cv.02_vima_tik
cs-th:tika:vin.cv.03_sara_tik
cs-th:tika:vin.cv.03_vima_tik
cs-th:tika:vin.cv.04_sara_tik
cs-th:tika:vin.cv.04_vima_tik
cs-th:tika:vin.cv.05_sara_tik
cs-th:tika:vin.cv.05_vima_tik
cs-th:tika:vin.cv.06_sara_tik
cs-th:tika:vin.cv.06_vima_tik
cs-th:tika:vin.cv.07_sara_tik
cs-th:tika:vin.cv.07_vima_tik
cs-th:tika:vin.cv.08_sara_tik
cs-th:tika:vin.cv.08_vima_tik
cs-th:tika:vin.cv.09_sara_tik
cs-th:tika:vin.cv.09_vima_tik
cs-th:tika:vin.cv.0_paci_tik
cs-th:tika:vin.cv.0_sara_tik
cs-th:tika:vin.cv.0_vaji_tik
cs-th:tika:vin.cv.0_vima_tik
cs-th:tika:vin.cv.10_sara_tik
cs-th:tika:vin.cv.10_vima_tik
cs-th:tika:vin.cv.11_sara_tik
cs-th:tika:vin.cv.11_vima_tik
cs-th:tika:vin.cv.12_sara_tik
cs-th:tika:vin.cv.12_vima_tik
cs-th:tika:vin.kank.0_kank_tik
cs-th:tika:vin.kankha.0_dvem_tik
cs-th:tika:vin.khud.01_khud_tik
cs-th:tika:vin.khud.02_khud_tik
cs-th:tika:vin.pac.pci_vima_tik
cs-th:tika:vin.vila.08_vila_tik
cs-th:tika:vin.vila.09_vila_tik
cs-th:tika:vin.vila.10_vila_tik
cs-th:tika:vin.vila.11_vila_tik
cs-th:tika:vin.vila.12_vila_tik
cs-th:tika:vin.vila.13_vila_tik
cs-th:tika:vin.vila.14_vila_tik
cs-th:tika:vin.vila.15_vila_tik
cs-th:tika:vin.vila.16_vila_tik
cs-th:tika:vin.vila.17_vila_tik
cs-th:tika:vin.vila.18_vila_tik
cs-th:tika:vin.vila.19_vila_tik
cs-th:tika:vin.vila.20_vila_tik
cs-th:tika:vin.vila.21_vila_tik
cs-th:tika:vin.vila.22_vila_tik
cs-th:tika:vin.vila.23_vila_tik
cs-th:tika:vin.vila.24_vila_tik

_/\_

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #38 on: April 01, 2019, 01:17:50 AM »
Sadhu

Atma recognized that the search is less rendered for scripts other then latin. Yet not understanding why separating certain ranges character by character.
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: [ATI.eu] Indexing and search engine issues
« Reply #39 on: April 01, 2019, 01:34:00 PM »
Atma has attached the Khmer and Thai Unicode table to possible exclude special characters like stops, computations ... breaking into single characters seems to be meaningless.

Not sure if Upasaka Vorapol may like to assist here for Thai. Atma will try to list special characters in Khmer but not sure now how the indexer or better the search engine works with compunctions and so on generally (simply cut them all away?)
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: [ATI.eu] Indexing and search engine issues
« Reply #40 on: April 01, 2019, 02:19:51 PM »
17D4 KHMER SIGN KHAN • functions as a full stop, period
(→ 0E2F   thai character paiyannoi
→ 104A   myanmar sign little section)

17D5 KHMER SIGN BARIYOOSAN • indicates the end of a section or a text
(→ 0E5A   thai character angkhankhu
→ 104B   myanmar sign section)

17D6 KHMER SIGN CAMNUC PII KUUH • functions as colon
(• the preferred transliteration is camnoc pii kuuh
→ 00F7 ÷  division sign → 0F14   tibetan mark gter tsheg )

17D7 KHMER SIGN LEK TOO • repetition sign
(→ 0E46   thai character maiyamok )

17D8 KHMER SIGN BEYYAL • et cetera
• use of this character is discouraged; other abbreviations for et cetera also exist • preferred spelling: ។ល។

17D9 KHMER SIGN PHNAEK MUAN • indicates the beginning of a book or a treatise • the preferred transliteration is phnek moan
(→ 0E4F   thai character fongman )

17DA KHMER SIGN KOOMUUT • indicates the end of a book or treatise • this forms a pair with 17D9   • the preferred transliteration is koomoot
(→ 0E5B   thai character khomut )

17DB KHMER CURRENCY SYMBOL RIEL

17E0 KHMER DIGIT ZERO
17E1 KHMER DIGIT ONE
17E2 KHMER DIGIT TWO
17E3 KHMER DIGIT THREE
17E4 KHMER DIGIT FOUR
17E5 KHMER DIGIT FIVE
17E6 KHMER DIGIT SIX
17E7 KHMER DIGIT SEVEN
17E8 KHMER DIGIT EIGHT
17E9 KHMER DIGIT NINE

0E50 THAI DIGIT ZERO
0E51 THAI DIGIT ONE
0E52 THAI DIGIT TWO
0E53 THAI DIGIT THREE
0E54 THAI DIGIT FOUR
0E55 THAI DIGIT FIVE
0E56 THAI DIGIT SIX
0E57 THAI DIGIT SEVEN
0E58 THAI DIGIT EIGHT
0E59 THAI DIGIT NINE

Word breaks are either white spaces or zero width spaces in both scripts, Khmer and Thai.
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +248/-0
  • Gender: Male
Re: [ATI.eu] Indexing and search engine issues
« Reply #41 on: April 01, 2019, 04:58:43 PM »
Sadhu, I was just reading the searching code, trying to understand a bit how it works.

Probably the breaking into single characters idea came from Chinese or Japanese, where I think characters usually represent a complete word. Maybe Thai was included by mistake, thinking that all Asian scripts use such logographic "full word" characters.

I think it is best to just remove the Thai block from the "Asian" set and treat it "normally" like Roman etc. The Khmer block (1780–17FF) is not included there either, and search is working well. Most important change needed would be probably to use zero-width spaces as separators.

not sure now how the indexer or better the search engine works with compunctions and so on generally (simply cut them all away?)
* Moritz ("punctuation" - not "compunction" = "Gewissenhaftigkeit", as Bhante Thanissaro translates "otappa")
Punctuation marks are removed during indexing, and the resulting pieces are stored as "words" in the index, along with some reference tables to store which page has how many occurrences of each word.
When searching, first the tables are searched to find all pages which contain all the single words in the search phrase. And then, if the search phrase (or parts of the search phrase) has been put into quotes "", also the whole search phrase (or the quoted parts) is matched to find the exact occurrence in the text, including punctuation marks and so on.
For example, searching for "ist den Drei Juwelen, dem Buddha, dem Dhamma, der Sangha, gewidmet" will find one result on the page http://www.accesstoinsight.eu/km/index now.
* Moritz (Strange: It should also find the same on http://www.accesstoinsight.eu/de/index) but it does not. Seems like the index is somehow incomplete again.
If leaving out one comma or one word, like "ist den Drei Juwelen Buddha, dem Dhamma, der Sangha, gewidmet", still put in quotes, no result would be found, because there is no exact match of the quotation.
However, if searching for the same without quotation marks around, results would be found again, just looking for every word, not for the whole phrase, and ignoring all punctuation.
* Moritz (Strange: Searching in this way also gives http://www.accesstoinsight.eu/de/index as a result, as it should be. But it did not find the match when searching for the whole quoted phrase.)

So it should still be possible then to find exact text passages including punctuation marks, if quoted. Although, as just seen, sometimes the search engine might not work as it should. ^-^

I think I know now how to do the necessary changes to include the Khmer and Thai punctuation marks and zero-white spaces as separators. After adding these separators, the pages would have to be re-indexed again.

I will try to do it later this week. Not today anymore.

_/\_

Offline Johann

  • Samanera
  • Very Engaged Member
  • *
  • Sadhu! or +361/-0
  • Gender: Male
  • Date of ordination/Datum der Ordination.: 20140527
Re: [ATI.eu] Indexing and search engine issues
« Reply #42 on: April 01, 2019, 05:11:35 PM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Tags: