Recent Topics

[March 28, 2024, 07:42:31 PM]

[March 28, 2024, 07:10:43 AM]

[March 24, 2024, 07:08:58 PM]

[March 22, 2024, 08:28:37 PM]

[March 21, 2024, 09:25:24 PM]

[March 21, 2024, 07:03:20 AM]

[March 20, 2024, 02:07:41 PM]

[March 19, 2024, 03:03:28 PM]

[March 18, 2024, 06:53:27 PM]

[March 16, 2024, 12:33:11 PM]

[March 07, 2024, 06:48:17 PM]

[March 03, 2024, 08:49:50 PM]

[February 29, 2024, 09:28:58 PM]

[February 26, 2024, 09:56:16 PM]

[February 26, 2024, 07:53:55 PM]

[February 25, 2024, 07:23:09 PM]

[February 25, 2024, 07:04:58 PM]

[February 25, 2024, 03:22:28 AM]

[February 23, 2024, 10:05:28 PM]

[February 23, 2024, 09:34:20 PM]

[February 23, 2024, 10:31:11 AM]

[February 21, 2024, 10:45:07 PM]

[February 21, 2024, 08:20:46 PM]

[February 17, 2024, 11:56:57 PM]

[February 14, 2024, 07:37:11 PM]

[February 07, 2024, 09:18:32 PM]

[February 07, 2024, 05:21:11 PM]

[February 02, 2024, 09:09:50 PM]

[February 01, 2024, 10:10:09 PM]

[January 29, 2024, 08:51:38 PM]

[January 29, 2024, 07:45:14 PM]

[January 29, 2024, 07:39:31 PM]

[January 23, 2024, 10:36:58 PM]

[January 22, 2024, 09:02:36 PM]

[January 22, 2024, 07:58:33 PM]

[January 22, 2024, 07:48:37 PM]

[January 19, 2024, 09:59:37 AM]

[January 16, 2024, 09:51:29 AM]

[January 15, 2024, 02:45:51 PM]

[January 10, 2024, 08:27:52 PM]

[January 10, 2024, 07:47:07 PM]

[January 04, 2024, 04:59:55 PM]

[January 01, 2024, 06:48:40 AM]

[December 29, 2023, 07:59:41 PM]

[December 26, 2023, 01:13:03 AM]

[December 24, 2023, 08:51:53 PM]

[December 22, 2023, 07:01:20 PM]

[December 22, 2023, 02:11:55 AM]

[December 21, 2023, 09:03:30 PM]

[December 19, 2023, 12:32:34 AM]

[December 17, 2023, 08:34:10 PM]

[December 17, 2023, 07:56:03 PM]

[December 16, 2023, 08:14:18 PM]

[December 15, 2023, 11:02:07 PM]

[December 14, 2023, 09:46:57 PM]

[December 13, 2023, 08:30:37 PM]

[December 13, 2023, 05:37:09 PM]

[December 11, 2023, 06:53:17 PM]

[December 11, 2023, 06:36:51 PM]

[December 10, 2023, 08:20:30 PM]

[December 10, 2023, 08:18:58 PM]

[December 10, 2023, 08:03:28 PM]

[December 10, 2023, 03:36:57 AM]

[December 09, 2023, 09:45:01 PM]

[December 08, 2023, 07:51:18 PM]

[December 06, 2023, 09:12:58 PM]

[December 05, 2023, 11:50:32 AM]

[December 01, 2023, 12:07:47 AM]

[November 30, 2023, 10:28:06 PM]

[November 30, 2023, 09:13:43 PM]

[November 30, 2023, 07:40:37 PM]

[November 29, 2023, 07:53:16 PM]

[November 28, 2023, 07:09:11 PM]

[November 27, 2023, 10:11:48 PM]

[November 23, 2023, 09:12:36 PM]

[November 23, 2023, 06:44:45 AM]

[November 21, 2023, 03:35:09 AM]

[November 20, 2023, 06:38:59 PM]

[November 19, 2023, 12:00:58 AM]

[November 18, 2023, 11:53:27 PM]

[November 18, 2023, 12:08:15 AM]

[November 15, 2023, 08:02:21 PM]

[November 14, 2023, 09:08:12 PM]

[November 14, 2023, 01:40:29 AM]

[November 09, 2023, 07:25:25 PM]

[November 09, 2023, 06:59:03 PM]

[November 09, 2023, 02:20:39 AM]

[November 07, 2023, 10:35:56 PM]

[November 07, 2023, 01:06:38 AM]

[November 07, 2023, 12:59:26 AM]

[November 06, 2023, 09:39:36 AM]

[November 06, 2023, 09:39:19 AM]

[November 04, 2023, 07:50:04 PM]

[November 04, 2023, 11:12:29 AM]

[October 30, 2023, 05:22:26 PM]

[October 23, 2023, 06:36:30 PM]

[October 19, 2023, 08:16:28 PM]

[October 19, 2023, 08:01:55 PM]

[October 18, 2023, 07:01:57 AM]

[October 13, 2023, 09:52:58 PM]

Talkbox

Like when enter or join, a shrine, another's sphere, or back: good for greating, bye, veneration, short talks, quick help. Some infos on regards .


2024 Mar 24 19:07:11
អរិយវង្ស:  _/\_ _/\_ _/\_ 😌

2024 Mar 24 14:13:29
blazer: Bhante Dhammañāṇa  _/\_ _/\_ _/\_

2024 Mar 24 06:25:25
Dhammañāṇa: A blessed full moon Uposatha by following the conducts of the Arahats.

2024 Mar 23 13:11:16
blazer: Hello everyone  _/\_

2024 Mar 21 01:07:56
Dhammañāṇa: Nyom

2024 Mar 21 00:28:58
Moritz: Vandami Bhante _/\_ _/\_ _/\_

2024 Mar 20 14:25:49
blazer: Bhante Dhammañāṇa  _/\_ _/\_ _/\_

2024 Mar 20 12:06:29
Dhammañāṇa: Nyom

2024 Mar 20 11:24:06
blazer: Good morning everyone  _/\_

2024 Mar 18 21:42:50
blazer:  _/\_ _/\_ _/\_

2024 Mar 18 19:43:59
Dhammañāṇa: Mudita, Nyom.

2024 Mar 18 19:36:35
blazer: Bhante Dhammañāṇa  _/\_ _/\_ _/\_ Undertaking this Sila day at my best.

2024 Mar 18 06:17:10
Dhammañāṇa: Those who undertake the Sila day today: may it be of much metta.

2024 Mar 18 02:16:41
blazer:  _/\_ _/\_ _/\_

2024 Mar 17 21:09:31
អរិយវង្ស: 🚬🚬🚬

2024 Mar 17 06:30:53
Dhammañāṇa: Metta-full Sila day, those after it today.

2024 Mar 17 00:02:34
blazer: Bhante Dhammañāṇa  _/\_ _/\_ _/\_

2024 Mar 11 09:16:04
Dhammañāṇa: Once totally caught by google, AI and machines, every door has been closed for long, long term.

2024 Mar 11 09:14:04
Dhammañāṇa: People at large just wait that another would do his/her duty. Once a slight door to run back, they are gone. By going again just for debts, the wheel of running away turns on.

2024 Mar 10 18:59:10
Dhammañāṇa: Less are those who don't use the higher Dhamma not for defilement-defence, less those who don't throw the basics away and turn back to sensuality "with ease".

2024 Mar 10 06:51:11
Dhammañāṇa: A auspicious new-moon Uposatha for those observing it today.

2024 Mar 09 06:34:39
Dhammañāṇa: A blessed New-moon Uposatha, and birth reminder day of a monarchy of wonders.

2024 Mar 08 21:39:54
Dhammañāṇa: The best way to keep an Ashram silent is to put always duties and Sila high. If wishing it populated, put meditation (eating) on the first place.

2024 Mar 03 21:27:27
Dhammañāṇa: May those undertaking the Sila day today, spend it off in best ways, similar those who go after the days purpose tomorrow.

2024 Feb 25 22:10:33
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Feb 24 06:42:35
Dhammañāṇa: A blessed Māgha Pūjā and Full moon Uposatha with much reason for good recallings of goodness.

2024 Feb 24 01:50:55
blazer: Bhante Dhammañāṇa  _/\_ _/\_ _/\_

2024 Feb 23 06:39:57
Dhammañāṇa: Nyom

2024 Feb 23 00:19:58
blazer: Taken flu again... at least leg pain has been better managed since many weeks and it's the greatest benefit. Hope Bhante Dhammañāṇa is fine  _/\_ _/\_ _/\_

2024 Feb 18 01:06:43
blazer:  _/\_ _/\_ _/\_

2024 Feb 18 00:02:37
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Feb 17 18:47:31
Dhammañāṇa: A blessed rest of todays Sila-day.

2024 Feb 17 18:46:59
Dhammañāṇa: Chau Marco, chau...

2024 Feb 16 23:32:59
blazer: Just ended important burocratic and medical stuff. I will check for a flight for Cambodia soon  _/\_

2024 Feb 09 16:08:32
blazer:  _/\_ _/\_ _/\_

2024 Feb 09 12:17:31
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Feb 09 06:42:17
Dhammañāṇa: May all spend a blessed New moon Uposatha and last day of the Chinese year of the rabbit, entering the Year of the Naga wisely.

2024 Feb 02 21:17:28
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Feb 02 19:53:28
Dhammañāṇa: May all have the possibility to spend a pleasing rest of Sila day, having given goodness and spend a faultless day.

2024 Jan 26 14:40:25
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Jan 25 10:02:46
Dhammañāṇa: May all spend a blessed Full moon Uposatha.

2024 Jan 11 06:37:21
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Jan 07 06:31:20
Dhammañāṇa: May many, by skilful deeds,  go for real and lasting independence today

2024 Jan 06 18:00:36
អរិយវង្ស:  _/\_ _/\_ _/\_

2024 Jan 04 16:57:17
blazer:  _/\_ _/\_ _/\_

2024 Jan 04 12:33:08
Dhammañāṇa: A blessed Sila-day, full of metta in thoughts, speech and deeds.

2023 Dec 30 20:21:07
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Dec 27 23:18:38
Dhammañāṇa: May the rest of a bright full moon Uposatha serve many as a blessed day of good deeds.

2023 Dec 26 23:12:17
blazer:  _/\_ _/\_ _/\_

2023 Dec 24 16:52:50
Dhammañāṇa: May all who celebrated the birth of their prophet, declaring them his ideas of reaching the Brahma realm, spend peaceful days with family and reflect the goodness near around them, virtuous, generously.

2023 Dec 20 21:36:37
blazer:  _/\_ _/\_ _/\_

2023 Dec 20 06:54:09
Dhammañāṇa: A blessed Sila day, by conducting in peacefull manners.

2023 Dec 12 23:45:24
blazer:  _/\_

2023 Dec 12 20:34:26
Dhammañāṇa: choice, yes  :)

2023 Dec 12 13:23:35
blazer: If meaning freedom of choice i understand and agree

2023 Dec 12 12:48:42
blazer:  _/\_ _/\_ _/\_

2023 Dec 12 06:13:23
Dhammañāṇa: May all spend a great New Moon Uposatha, following the conducts of the Arahats.

2023 Dec 10 12:51:16
Dhammañāṇa: The more freedom of joice, the more troubled in regard of what's right, what's wrong. My person does not say that people at large are prepared for freedom of joice even a little.

2023 Dec 10 10:59:42
blazer: Hope they eat more mindfully than how they talk. It is clear for the gross food, we had more than a talk about this topic. I have put so much effort in mindful eating at the temple, but when i was back i wanted more refined food. I was used to get a choice of more than 10 dishes every day

2023 Dec 10 06:57:44
Dhammañāṇa: A person eating on unskilled thoughts will last defiled, Nyom. Gross food does nothing for purification at all.

2023 Dec 09 21:41:58
blazer: I've had a couple of not nice experiences with monks that were not so pure in my opinion. They surely eat far better than me at temple.

2023 Dec 09 21:41:41
blazer: Ven. Johann  _/\_ _/\_ _/\_

2023 Dec 09 11:38:36
Dhammañāṇa: Spiritual prostitution, just another way of livelihood.

2023 Dec 05 20:59:38
Dhammañāṇa: May all spend a pleasing rest of Sila-day.

2023 Nov 27 14:47:22
អរិយវង្ស:   _/\_ _/\__/\_

2023 Nov 27 05:41:32
Dhammañāṇa: May all spend a blessed Anapanasati- Fullmoon and reflect the goodness of Ven Sāriputta as well today.

2023 Nov 20 19:18:13
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Nov 20 18:20:15
Dhammañāṇa: May all spend a pleasing rest of Sila-day.

2023 Nov 20 02:48:24
Moritz: Hello _/\_ Still possible to join: An-other Journey into the East 2023/24

2023 Nov 18 13:55:11
blazer: Hello everyone  _/\_ _/\_ _/\_

2023 Nov 12 01:09:01
Dhammañāṇa: Nyom

2023 Nov 12 00:45:21
Moritz: Vandami Bhante _/\_ _/\_ _/\_

2023 Nov 09 19:42:10
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Nov 09 07:17:02
Dhammañāṇa: សិលា​នាំ​ទៅ​រក​ឯករាជ្យ​នៃ​ជាតិ! សូមឱ្យមនុស្សជាច្រើនប្រារព្ធទិវាឯករាជ្យ(ពី)ជាតិ។

2023 Nov 09 07:06:56
Dhammañāṇa: Sila leads to independence of Jati! May many observe a conductive Independence day.

2023 Nov 07 00:54:02
Dhammañāṇa: Nyoum

2023 Nov 07 00:39:55
Moritz: Vandami Bhante _/\_ _/\_ _/\_

2023 Nov 06 15:47:51
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Nov 06 12:21:27
Dhammañāṇa: A blessed Sila observation day today.

2023 Oct 30 15:17:36
Dhammañāṇa: It's common in to give up that what's given to do assist me toward release, common that seeking security in what binds.

2023 Oct 30 13:22:27
អរិយវង្ស: ព្រះអង្គ :) កូណាលុប delta chat ហើយ :D _/\_ _/\_ _/\_

2023 Oct 23 18:56:09
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Oct 22 20:36:01
Dhammañāṇa: May all spend a pleasing rest of this Sila-day.

2023 Oct 19 20:31:12
Dhammañāṇa: Nyom Sreyneang

2023 Oct 15 07:07:01
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Oct 14 06:53:21
Dhammañāṇa: May all spend a New moon Uposatha based on goodwill for all, find seclusion in the middle of family duties.

2023 Sep 29 07:35:30
blazer:  _/\_ _/\_ _/\_

2023 Sep 29 07:23:47
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Sep 29 07:03:11
Dhammañāṇa: A blessed full moon Uposatha and begin of the ancestor weeks by lived metta and virtue: lived gratitude toward all being, toward one self.

2023 Sep 22 22:07:43
Dhammañāṇa: If no rush turn toward reducing sensuality and make Silas the top of priority, it's to fear that an Atomic conflic will be chosen soon, in the battle of control of the "drugs".

2023 Sep 22 14:59:39
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Sep 22 06:35:51
Dhammañāṇa: A blessed Uposatha Observance on this Sila-day, by conducting similar the Arahats.

2023 Sep 16 19:29:27
blazer: Ven. Johann  _/\_ _/\_ _/\_

2023 Sep 16 19:29:13
blazer: Hello everyone! I've just come back home. I had a long trip and no sleep for more than 30 hours, but currently feel quite good. I've had a good experience, i'm happy. I've found out much inspiration and many ideas about the training and the holy life. I'll recollect and write about them as soon as i've taken some rest. Hope to find you all well and in good health  _/\_ _/\_ _/\_

2023 Sep 15 05:25:24
អរិយវង្ស:  _/\_ _/\_ _/\_

2023 Sep 14 21:09:49
Dhammañāṇa: A blessed rest of New moon Uposatha today (later as no connection before).

2023 Sep 10 01:55:47
អរិយវង្ស:  _/\_ _/\_ _/\_?

2023 Sep 09 18:52:54
Dhammañāṇa: No existence, no 'way of life', can excel the finally journey, just 'busy' in given away all of what ever made one's own. A total remorse-less existence. May many go for it, and see the way toward the deathless, no more worry of past, future and present as well.

2023 Sep 09 18:52:28
Dhammañāṇa: No existence, no 'way of life', can excel the finally journey, just 'busy' in given away all of what ever made one's own. A total remorse-less existence. May many go for it, and see the way toward the deathless, no more worry of past, future and present as well.

2023 Sep 08 06:19:20
Dhammañāṇa: A blessed Sila day, by maintaining goodwill toward all, not only by deeds and speech, but with nine factors, incl. a mind full of metta.

Tipitaka Khmer

 Please feel welcome to join the transcription project of the Tipitaka translation in khmer, and share one of your favorite Sutta or more. Simply click here or visit the Forum: 

Search ATI on ZzE

Zugang zur Einsicht - Schriften aus der Theravada Tradition



Access to Insight / Zugang zur Einsicht: Dhamma-Suche auf mehr als 4000 Webseiten (deutsch / english) - ohne zu googeln, andere Ressourcen zu nehmen, weltliche Verpflichtungen einzugehen. Sie sind für den Zugang zur Einsicht herzlich eingeladen diese Möglichkeit zu nutzen. (Info)

Random Sutta
Random Article
Random Jataka

Zufälliges Sutta
Zufälliger Artikel
Zufälliges Jataka


Arbeits/Work Forum ZzE

"Dhammatalks.org":
[logo dhammatalks.org]
Random Talk
[pic 30]

Chaṭṭha Saṅgāyana Tipitaka

Dear Visitor!

Herzlich Willkommen auf sangham.net! Welcome to sangham.net!
Ehrenwerter Gast, fühlen sie sich willkommen!

Sie können sich gerne auch unangemeldet an jeder Diskussion beteiligen und eine Antwort posten. Auch ist es Ihnen möglich, ein Post oder ein Thema an die Moderatoren zu melden, sei es nun, um ein Lob auszusprechen oder um zu tadeln. Beides ist willkommen, wenn es gut gemeint und umsichtig ist. Lesen Sie mehr dazu im Beitrag: Melden/Kommentieren von Postings für Gäste
Sie können sich aber auch jederzeit anmelden oder sich via Email einladen und anmelden lassen oder als "Visitor" einloggen, und damit stehen Ihnen noch viel mehr Möglichkeiten frei. Nutzen Sie auch die Möglichkeit einen Segen auszusprechen oder ein Räucherstäbchen anzuzünden und wir freuen uns, wenn Sie sich auch als Besucher kurz vorstellen oder Hallo sagen .
Wir wünschen viel Freude beim Nutzen und Entdecken des Forums mit all seinen nützlichen Möglichkeiten .
 
Wählen Sie Ihre bevorzugte Sprache rechts oben neben dem Suchfenster.

Wähle Sprache / Choose Language / เลือก ภาษา / ជ្រើសយកភាសា: ^ ^
 Venerated Visitor, feel heartily welcome!
You are able to participate in discussions and post even without registration. You are also able to report a post or topic to the moderators, may it be praise or a rebuke. Both is welcome if it is meant with good will and care. Read more about it within the post: Report/comment posts for guests
But you can also register any time or get invited and registered in the way to request via Email , or log in as "Visitor". If you are logged in you will have more additional possibilities. Please feel free to use the possibility to  give a blessing or light an incent stick and we are honored if you introduce yourself or say "Hello" even if you are on a short visit.
We wish you much joy in using and exploring the forum with all its useful possibilities  
Choose your preferred language on the right top corner next to the search window!
A message and email solution for Venerable's Sangha, your Parisa or Upasaka's community in Dhamma: May one make use of the given "Sangha-messager": Download app here . More infos see here . មិនទាន់មានកម្មវិធីផ្ញើសារទេ? ទាញយកសារហារីសង្ឃ

Zugang zur Einsicht - Übersetzung, Kritik und Anmerkungen

Herzlich Willkommen im Arbeitsforum von zugangzureinsicht.org im Onlinekloster sangham.net!


Danke werte(r) Besucher(in), dass Sie von dieser Möglichkeit Gebrauch machen und sich direkt einbringen wollen.

Unten (wenn Sie etwas scrollen) finden Sie eine Eingabemaske, in der Sie Ihre Eingabe einbringen können. Es stehen Ihnen auch verschiedene Gestaltungsmöglichkeiten zur Verfügung. Wenn Sie einen Text im formatierten Format abspeichern wollen, klicken Sie bitte das kleine Kästchen mit dem Pfeil.

Die Textfelder "Name" und "email" müssen ausgefüllt werden, Sie können hier aber auch eine Anonyme Angabe machen und eine Pseudo-email angeben (geben Sie, wenn Sie Rückantwort haben wollen, jedoch einen Kontakt an), wenn Ihnen das unangenehm ist. Der Name scheint im Forum als Text auf und die Email ist von niemanden außer dem Administrator einsehbar.

Wenn Sie den Text fertig geschrieben haben, müssen Sie noch den Spamschutz überwinden, das Bild zusammen setzen, und dann auf "Vorschau" oder "Senden" drücken, wenn für Sie alles passt.

Wenn Sie eine Spende einer Übersetzung machen wollen, wäre es schön, wenn Sie etwas vom Entstehen bzw. deren Herkunft erzählen und Ihrer Gabe vielleicht noch eine Widmung anhängen.

Gerne, so es möglich ist, werden wir Ihre Übersetzung dann auch den Seiten von Zugang zur Einsicht veröffentlichen. Für generelle Fragen zu dem Umfang der Dhamma-Geschenke auf ZzE sehen Sie bitte in den FAQ von ZzE ein.

Gerne empfangen wir Kritik und selbstverständlich auch Korrekturen oder Anregungen hier. Es steht Ihnen natürlich offen und Sie sind dazu herzlich eingeladen auch direkt mit einem eigenen Zugang hier an den Arbeiten vielleicht direkt teilzunehmen.

Sadhu!

metta & mudita
Ihr Zugang zur Einsicht Team

Um sich im Abeitsforum etwas unzusehen, klicken Sie hier. . Sie finden hier viele Informationen und vielleicht sogar neues rund um Zugang zur Einsicht.

Author Topic: [ATI.eu] Indexing and search engine issues  (Read 7464 times)

0 Members and 1 Guest are viewing this topic.

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #30 on: March 28, 2019, 02:32:56 PM »
Currently not using search or batchedit, how ever Nyom might think.

(There is a inbuilt search.php, told that it can be executed direct on the server to rebuild the index. Maybe that helps. https://www.dokuwiki.org/cli#indexerphp )
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +299/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #31 on: March 28, 2019, 02:48:11 PM »
Currently not using search or batchedit, how ever Nyom might think.

(There is a inbuilt search.php, told that it can be executed direct on the server to rebuild the index. Maybe that helps. https://www.dokuwiki.org/cli#indexerphp )

Rebuilding index started.

The helper scripts listed on https://www.dokuwiki.org/cli are only usable if one has shell access on the server. But that is not the case for the Greensta server here. (But still possibly useful to look into and adapt something maybe when having more time for it.) So just using the previous approach now.

_/\_

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #32 on: March 28, 2019, 02:51:04 PM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +299/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #33 on: March 29, 2019, 02:25:48 PM »
I accidentally restarted rebuilding the index again from scratch. So now, progress is again at about 5000/20000 pages.

I wrote a new script, adapting methods from the CLI script , so that the whole process would run on the server, not needing to have a connection and open browser window all the time to send commands for every single page to be indexed one by one.
This should at least be a little bit faster, without the sending commands and responses back and forth, but the speed difference is not really noticeable. So it should, again, be finished in one day.

The current progress can be seen by opening http://accesstoinsight.eu/indexer.success.log (listing pages that were indexed successfully) and http://accesstoinsight.eu/indexer.error.log (listing pages which could not be indexed for some reason, currently empty).
There is a counting number before each page name in the lists, so one can see how many pages have already been processed.

_/\_

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #34 on: March 29, 2019, 03:44:28 PM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +299/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #35 on: March 30, 2019, 02:48:15 PM »
The indexing script I had started on the server (which should be doing just the same as the CLI indexer script) stopped at some point due to running out of memory (working memory, not storage memory). It seems that certain pages simply cannot be indexed because the indexer would need too much memory for it.
For example http://accesstoinsight.eu/cs-th:tika:sut.dn.0_tik and following pages always fail with
Code: [Select]
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 67108872 bytes) in /var/www/clients/client2157/web5417/web/inc/indexer.php on line 612
or similar.

Line 612 is here:
Code: [Select]
$wordlist = explode(' ', $text);
splitting the whole text of a page into single words by spaces.

But I really do not understand why this would take so much memory. Also, replicating this same operation on my computer, splitting the same page text with the same methods into single words and storing in a variable in PHP, does not need nearly as much memory here.

Trying to find a way to work around it, I gave up now.

Continued indexing with the other method (which runs locally on my computer and sends a command for every single page to be indexed through the network, and does not stop if a page fails to be indexed), currently indexed until ~11000 pages (with many "holes" of pages which just cannot be indexed with the current server).

Should be finished in some 16 hours maybe if now just let to run. But with the current server infrastructure it seems the search index will always be incomplete.

_/\_

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #36 on: March 30, 2019, 04:07:07 PM »
Sadhu for effort and care. May Nyom always give/take himself his time.

(The "big pages", Atma thinks about 10 %, like the other of the cscd Tipitaka, would not change later on in regard of content. Atma remembers that once there was still a search engine on ZzE, it was also never possible to index all Pali Tipitaka pages of original Ati as well, always having errors.

On the other side, on ZzE once and also now on ati.eu, there have been times where the index was obviously complete.)

« Last Edit: March 30, 2019, 04:29:30 PM by Johann »
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +299/-0
  • Gender: Male
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #37 on: April 01, 2019, 12:23:59 AM »
Quote
May Nyom always give/take himself his time.
_/\_

Indexing finished some time this morning.

Quote
On the other side, on ZzE once and also now on ati.eu, there have been times where the index was obviously complete.)
Obviously (offensichtlich)? Or apparently (offenbar, scheinbar, anscheinend)?

I think maybe the latter, because these errors would never appear in the Searchindex Manager plugin. It would just say "page already up to date" or something, when a page could not be indexed.

After retrying several times to index the files which failed, all files which still could not be indexed are just 474 pages in Thai script (listed below). I think the reason is the way DokuWiki handles some Asian scripts, including Thai, treating every character as a single word, which would take a lot of memory for the indexer. Quote from inc/indexer.php file, line 18 and following:
Code: [Select]
// Asian characters are handled as words. The following regexp defines the
// Unicode-Ranges for Asian characters
// Ranges taken from http://en.wikipedia.org/wiki/Unicode_block
// I'm no language expert. If you think some ranges are wrongly chosen or
// a range is missing, please contact me
define('IDX_ASIAN1','[\x{0E00}-\x{0E7F}]'); // Thai

I have deleted all files in en:s and de:s which were just examples on how to integrate Google Site Search and comments about other search engines tested by Mr. Bullitt for accesstoinsight.org in the past.

List of unindexed Thai script files:
Code: [Select]
cs-th:atthakatha:sut.kn.jat.v01_att
cs-th:atthakatha:sut.kn.jat.v02_att
cs-th:atthakatha:sut.kn.jat.v03_att
cs-th:atthakatha:sut.kn.jat.v04_att
cs-th:atthakatha:sut.kn.jat.v05_att
cs-th:atthakatha:sut.kn.jat.v06_att
cs-th:atthakatha:sut.kn.jat.v07_att
cs-th:atthakatha:sut.kn.jat.v08_att
cs-th:atthakatha:sut.kn.jat.v09_att
cs-th:atthakatha:sut.kn.jat.v10_att
cs-th:atthakatha:sut.kn.jat.v11_att
cs-th:atthakatha:sut.kn.jat.v12_att
cs-th:atthakatha:sut.kn.jat.v13_att
cs-th:atthakatha:sut.kn.jat.v14_att
cs-th:atthakatha:sut.kn.jat.v15_att
cs-th:atthakatha:sut.kn.jat.v16_att
cs-th:atthakatha:sut.kn.jat.v17_att
cs-th:atthakatha:sut.kn.jat.v18_att
cs-th:atthakatha:sut.kn.jat.v19_att
cs-th:atthakatha:sut.kn.jat.v20_att
cs-th:atthakatha:sut.kn.jat.v21_att
cs-th:atthakatha:sut.kn.jat.v22_att
cs-th:atthakatha:sut.kn.jat.v23_att
cs-th:atthakatha:sut.kn.khp.0_att
cs-th:atthakatha:sut.kn.khp.1_att
cs-th:atthakatha:sut.kn.khp.2_att
cs-th:atthakatha:sut.kn.khp.3_att
cs-th:atthakatha:sut.kn.khp.4_att
cs-th:atthakatha:sut.kn.khp.5_att
cs-th:atthakatha:sut.kn.khp.6_att
cs-th:atthakatha:sut.kn.khp.7_att
cs-th:atthakatha:sut.kn.khp.8_att
cs-th:atthakatha:sut.kn.khp.9_att
cs-th:atthakatha:sut.kn.man.00_att
cs-th:atthakatha:sut.kn.man.01_att
cs-th:atthakatha:sut.kn.man.02_att
cs-th:atthakatha:sut.kn.man.03_att
cs-th:atthakatha:sut.kn.man.04_att
cs-th:atthakatha:sut.kn.man.05_att
cs-th:atthakatha:sut.kn.man.06_att
cs-th:atthakatha:sut.kn.man.07_att
cs-th:atthakatha:sut.kn.man.08_att
cs-th:atthakatha:sut.kn.man.09_att
cs-th:atthakatha:sut.kn.man.10_att
cs-th:atthakatha:sut.kn.man.11_att
cs-th:atthakatha:sut.kn.man.12_att
cs-th:atthakatha:sut.kn.man.13_att
cs-th:atthakatha:sut.kn.man.14_att
cs-th:atthakatha:sut.kn.man.15_att
cs-th:atthakatha:sut.kn.man.16_att
cs-th:atthakatha:sut.kn.net.0_att
cs-th:atthakatha:sut.kn.net.1_att
cs-th:atthakatha:sut.kn.net.2_att
cs-th:atthakatha:sut.kn.net.3_att
cs-th:atthakatha:sut.kn.net.4_att
cs-th:atthakatha:sut.kn.net.5_att
cs-th:atthakatha:sut.kn.net.6_att
cs-th:atthakatha:sut.kn.pat.v0_att
cs-th:atthakatha:sut.kn.pat.v1.01_att
cs-th:atthakatha:sut.kn.pat.v1.02_att
cs-th:atthakatha:sut.kn.pat.v1.03_att
cs-th:atthakatha:sut.kn.pat.v1.04_att
cs-th:atthakatha:sut.kn.pat.v1.05_att
cs-th:atthakatha:sut.kn.pat.v1.06_att
cs-th:atthakatha:sut.kn.pat.v1.07_att
cs-th:atthakatha:sut.kn.pat.v1.08_att
cs-th:atthakatha:sut.kn.pat.v1.09_att
cs-th:atthakatha:sut.kn.pat.v1.10_att
cs-th:atthakatha:sut.kn.pat.v1_att
cs-th:atthakatha:sut.kn.pat.v2_att
cs-th:atthakatha:sut.kn.pat.v3.01_att
cs-th:atthakatha:sut.kn.pat.v3.02_att
cs-th:atthakatha:sut.kn.pat.v3.03_att
cs-th:atthakatha:sut.kn.pat.v3.04_att
cs-th:atthakatha:sut.kn.pat.v3.05_att
cs-th:atthakatha:sut.kn.pat.v3.06_att
cs-th:atthakatha:sut.kn.pat.v3.07_att
cs-th:atthakatha:sut.kn.pat.v3.08_att
cs-th:atthakatha:sut.kn.pat.v3.09_att
cs-th:atthakatha:sut.kn.pat.v3.10_att
cs-th:atthakatha:sut.kn.pat.v3_att
cs-th:atthakatha:sut.kn.pev.0_att
cs-th:atthakatha:sut.kn.pev.1_att
cs-th:atthakatha:sut.kn.pev.2_att
cs-th:atthakatha:sut.kn.pev.3_att
cs-th:atthakatha:sut.kn.pev.4_att
cs-th:atthakatha:sut.kn.snp.1_att
cs-th:atthakatha:sut.kn.snp.2_att
cs-th:atthakatha:sut.kn.snp.3_att
cs-th:atthakatha:sut.kn.snp.4_att
cs-th:atthakatha:sut.kn.snp.5_att
cs-th:atthakatha:sut.kn.tha.00_att
cs-th:atthakatha:sut.kn.tha.01_att
cs-th:atthakatha:sut.kn.tha.02_att
cs-th:atthakatha:sut.kn.tha.03_att
cs-th:atthakatha:sut.kn.tha.04_att
cs-th:atthakatha:sut.kn.tha.05_att
cs-th:atthakatha:sut.kn.tha.06_att
cs-th:atthakatha:sut.kn.tha.07_att
cs-th:atthakatha:sut.kn.tha.08_att
cs-th:atthakatha:sut.kn.tha.09_att
cs-th:atthakatha:sut.kn.tha.10_att
cs-th:atthakatha:sut.kn.tha.11_att
cs-th:atthakatha:sut.kn.tha.12_att
cs-th:atthakatha:sut.kn.tha.13_att
cs-th:atthakatha:sut.kn.tha.14_att
cs-th:atthakatha:sut.kn.tha.15_att
cs-th:atthakatha:sut.kn.tha.16_att
cs-th:atthakatha:sut.kn.tha.17_att
cs-th:atthakatha:sut.kn.tha.18_att
cs-th:atthakatha:sut.kn.tha.19_att
cs-th:atthakatha:sut.kn.tha.20_att
cs-th:atthakatha:sut.kn.tha.21_att
cs-th:atthakatha:sut.kn.thi.01_att
cs-th:atthakatha:sut.kn.thi.02_att
cs-th:atthakatha:sut.kn.thi.03_att
cs-th:atthakatha:sut.kn.thi.04_att
cs-th:atthakatha:sut.kn.thi.05_att
cs-th:atthakatha:sut.kn.thi.06_att
cs-th:atthakatha:sut.kn.thi.07_att
cs-th:atthakatha:sut.kn.thi.08_att
cs-th:atthakatha:sut.kn.thi.09_att
cs-th:atthakatha:sut.kn.thi.10_att
cs-th:atthakatha:sut.kn.thi.11_att
cs-th:atthakatha:sut.kn.thi.12_att
cs-th:atthakatha:sut.kn.thi.13_att
cs-th:atthakatha:sut.kn.thi.14_att
cs-th:atthakatha:sut.kn.thi.15_att
cs-th:atthakatha:sut.kn.thi.16_att
cs-th:atthakatha:sut.kn.uda.0_att
cs-th:atthakatha:sut.kn.uda.1_att
cs-th:atthakatha:sut.kn.uda.2_att
cs-th:atthakatha:sut.kn.uda.3_att
cs-th:atthakatha:sut.kn.uda.4_att
cs-th:atthakatha:sut.kn.uda.5_att
cs-th:atthakatha:sut.kn.uda.6_att
cs-th:atthakatha:sut.kn.uda.7_att
cs-th:atthakatha:sut.kn.uda.8_att
cs-th:atthakatha:sut.kn.viv.v0_att
cs-th:atthakatha:sut.kn.viv.v1_att
cs-th:atthakatha:sut.kn.viv.v2_att
cs-th:atthakatha:sut.mn.v00_att
cs-th:atthakatha:sut.mn.v01_att
cs-th:atthakatha:sut.mn.v02_att
cs-th:atthakatha:sut.mn.v03_att
cs-th:atthakatha:sut.mn.v04_att
cs-th:atthakatha:sut.mn.v05_att
cs-th:atthakatha:sut.mn.v06_att
cs-th:atthakatha:sut.mn.v07_att
cs-th:atthakatha:sut.mn.v08_att
cs-th:atthakatha:sut.mn.v09_att
cs-th:atthakatha:sut.mn.v10_att
cs-th:atthakatha:sut.mn.v11_att
cs-th:atthakatha:sut.mn.v12_att
cs-th:atthakatha:sut.mn.v13_att
cs-th:atthakatha:sut.mn.v14_att
cs-th:atthakatha:sut.mn.v15_att
cs-th:atthakatha:sut.sn.00_att
cs-th:atthakatha:sut.sn.01_att
cs-th:atthakatha:sut.sn.02_att
cs-th:atthakatha:sut.sn.03_att
cs-th:atthakatha:sut.sn.04_att
cs-th:atthakatha:sut.sn.05_att
cs-th:atthakatha:sut.sn.06_att
cs-th:atthakatha:sut.sn.07_att
cs-th:atthakatha:sut.sn.08_att
cs-th:atthakatha:sut.sn.09_att
cs-th:atthakatha:sut.sn.10_att
cs-th:atthakatha:sut.sn.11_att
cs-th:atthakatha:sut.sn.12_att
cs-th:atthakatha:sut.sn.13_att
cs-th:atthakatha:sut.sn.14_att
cs-th:atthakatha:sut.sn.15_att
cs-th:atthakatha:sut.sn.16_att
cs-th:atthakatha:sut.sn.17_att
cs-th:atthakatha:sut.sn.18_att
cs-th:atthakatha:sut.sn.19_att
cs-th:atthakatha:sut.sn.20_att
cs-th:atthakatha:sut.sn.21_att
cs-th:atthakatha:sut.sn.22_att
cs-th:atthakatha:sut.sn.23_att
cs-th:atthakatha:sut.sn.24_att
cs-th:atthakatha:sut.sn.25_att
cs-th:atthakatha:sut.sn.26_att
cs-th:atthakatha:sut.sn.27_att
cs-th:atthakatha:sut.sn.28_att
cs-th:atthakatha:sut.sn.29_att
cs-th:atthakatha:sut.sn.30_att
cs-th:atthakatha:sut.sn.31_att
cs-th:atthakatha:sut.sn.32_att
cs-th:atthakatha:sut.sn.33_att
cs-th:atthakatha:sut.sn.34_att
cs-th:atthakatha:sut.sn.35_att
cs-th:atthakatha:sut.sn.36_att
cs-th:atthakatha:sut.sn.37_att
cs-th:atthakatha:sut.sn.38_att
cs-th:atthakatha:sut.sn.39_att
cs-th:atthakatha:sut.sn.40_att
cs-th:atthakatha:sut.sn.41_att
cs-th:atthakatha:sut.sn.42_att
cs-th:atthakatha:sut.sn.43_att
cs-th:atthakatha:sut.sn.44_att
cs-th:atthakatha:sut.sn.45_att
cs-th:atthakatha:sut.sn.46_att
cs-th:atthakatha:sut.sn.47_att
cs-th:atthakatha:sut.sn.48_att
cs-th:atthakatha:sut.sn.49_att
cs-th:atthakatha:sut.sn.50_att
cs-th:atthakatha:sut.sn.51_att
cs-th:atthakatha:sut.sn.52_att
cs-th:atthakatha:sut.sn.53_att
cs-th:atthakatha:sut.sn.54_att
cs-th:atthakatha:sut.sn.55_att
cs-th:atthakatha:sut.sn.56_att
cs-th:atthakatha:vin.cv.01_att
cs-th:atthakatha:vin.cv.02_att
cs-th:atthakatha:vin.cv.03_att
cs-th:atthakatha:vin.cv.04_att
cs-th:atthakatha:vin.cv.05_att
cs-th:atthakatha:vin.cv.06_att
cs-th:atthakatha:vin.cv.07_att
cs-th:atthakatha:vin.cv.08_att
cs-th:atthakatha:vin.cv.09_att
cs-th:atthakatha:vin.cv.10_att
cs-th:atthakatha:vin.cv.11_att
cs-th:atthakatha:vin.cv.12_att
cs-th:atthakatha:vin.mv.01_att
cs-th:atthakatha:vin.mv.02_att
cs-th:atthakatha:vin.mv.03_att
cs-th:atthakatha:vin.mv.04_att
cs-th:atthakatha:vin.mv.05_att
cs-th:atthakatha:vin.mv.06_att
cs-th:atthakatha:vin.mv.07_att
cs-th:atthakatha:vin.mv.08_att
cs-th:atthakatha:vin.mv.09_att
cs-th:atthakatha:vin.mv.10_att
cs-th:atthakatha:vin.pac.ak_att
cs-th:atthakatha:vin.pac.nii_att
cs-th:atthakatha:vin.pac.pc_att
cs-th:atthakatha:vin.pac.pci_att
cs-th:atthakatha:vin.pac.pd_att
cs-th:atthakatha:vin.pac.pdi_att
cs-th:atthakatha:vin.pac.pri_att
cs-th:atthakatha:vin.pac.sgi_att
cs-th:atthakatha:vin.pac.sk_att
cs-th:atthakatha:vin.par.ay_att
cs-th:atthakatha:vin.par.ga_att
cs-th:atthakatha:vin.par.ni_att
cs-th:atthakatha:vin.par.pr_att
cs-th:atthakatha:vin.par.sg_att
cs-th:atthakatha:vin.par.ve_att
cs-th:atthakatha:vin.pv.01_att
cs-th:atthakatha:vin.pv.02_att
cs-th:atthakatha:vin.pv.03_att
cs-th:atthakatha:vin.pv.04_att
cs-th:atthakatha:vin.pv.05_att
cs-th:atthakatha:vin.pv.06_att
cs-th:atthakatha:vin.pv.07_att
cs-th:atthakatha:vin.pv.08_att
cs-th:atthakatha:vin.pv.09_att
cs-th:atthakatha:vin.pv.10_att
cs-th:atthakatha:vin.pv.11_att
cs-th:atthakatha:vin.pv.12_att
cs-th:atthakatha:vin.pv.13_att
cs-th:atthakatha:vin.pv.14_att
cs-th:atthakatha:vin.pv.15_att
cs-th:atthakatha:vin.pv.16_att
cs-th:atthakatha:vin.pv.17_att
cs-th:atthakatha:vin.pv.18_att
cs-th:tika:abh.ava-pura.01_tik
cs-th:tika:abh.ava-pura.02_tik
cs-th:tika:abh.ava-pura.03_tik
cs-th:tika:abh.ava-pura.04_tik
cs-th:tika:abh.ava-pura.05_tik
cs-th:tika:abh.ava-pura.06_tik
cs-th:tika:abh.ava-pura.07_tik
cs-th:tika:abh.ava-pura.08_tik
cs-th:tika:abh.ava-pura.09_tik
cs-th:tika:abh.ava-pura.10_tik
cs-th:tika:abh.ava-pura.11_tik
cs-th:tika:sut.dn.01_abh_tik
cs-th:tika:sut.dn.01_tik
cs-th:tika:sut.dn.02_abh_tik
cs-th:tika:sut.dn.02_tik
cs-th:tika:sut.dn.03_abh_tik
cs-th:tika:sut.dn.03_tik
cs-th:tika:sut.dn.04_abh_tik
cs-th:tika:sut.dn.04_tik
cs-th:tika:sut.dn.05_abh_tik
cs-th:tika:sut.dn.05_tik
cs-th:tika:sut.dn.06_abh_tik
cs-th:tika:sut.dn.06_tik
cs-th:tika:sut.dn.07_abh_tik
cs-th:tika:sut.dn.07_tik
cs-th:tika:sut.dn.08_abh_tik
cs-th:tika:sut.dn.08_tik
cs-th:tika:sut.dn.09_abh_tik
cs-th:tika:sut.dn.09_tik
cs-th:tika:sut.dn.0_tik
cs-th:tika:sut.dn.10_abh_tik
cs-th:tika:sut.dn.10_tik
cs-th:tika:sut.dn.11_abh_tik
cs-th:tika:sut.dn.11_tik
cs-th:tika:sut.dn.12_abh_tik
cs-th:tika:sut.dn.12_tik
cs-th:tika:sut.dn.13_abh_tik
cs-th:tika:sut.dn.13_tik
cs-th:tika:sut.dn.14_tik
cs-th:tika:sut.dn.15_tik
cs-th:tika:sut.dn.16_tik
cs-th:tika:sut.dn.17_tik
cs-th:tika:sut.dn.18_tik
cs-th:tika:sut.dn.19_tik
cs-th:tika:sut.dn.20_tik
cs-th:tika:sut.dn.21_tik
cs-th:tika:sut.dn.22_tik
cs-th:tika:sut.dn.23_tik
cs-th:tika:sut.dn.24_tik
cs-th:tika:sut.dn.25_tik
cs-th:tika:sut.dn.26_tik
cs-th:tika:sut.dn.27_tik
cs-th:tika:sut.dn.28_tik
cs-th:tika:sut.dn.29_tik
cs-th:tika:sut.dn.30_tik
cs-th:tika:sut.dn.31_tik
cs-th:tika:sut.dn.32_tik
cs-th:tika:sut.dn.33_tik
cs-th:tika:sut.dn.34_tik
cs-th:tika:sut.kn.paka00_tik
cs-th:tika:sut.kn.paka01_tik
cs-th:tika:sut.kn.paka02_tik
cs-th:tika:sut.kn.paka03_tik
cs-th:tika:sut.kn.paka04_tik
cs-th:tika:sut.kn.paka05_tik
cs-th:tika:sut.kn.paka06_tik
cs-th:tika:sut.kn.vibh01_tik
cs-th:tika:sut.kn.vibh02_tik
cs-th:tika:sut.kn.vibh03_tik
cs-th:tika:sut.kn.vibh04_tik
cs-th:tika:sut.kn.vibh05_tik
cs-th:tika:sut.kn.vibh06_tik
cs-th:tika:sut.mn.0_tik
cs-th:tika:sut.mn.v01_tik
cs-th:tika:sut.mn.v02_tik
cs-th:tika:sut.mn.v03_tik
cs-th:tika:sut.mn.v04_tik
cs-th:tika:sut.mn.v05_tik
cs-th:tika:sut.mn.v06_tik
cs-th:tika:sut.mn.v07_tik
cs-th:tika:sut.mn.v08_tik
cs-th:tika:sut.mn.v09_tik
cs-th:tika:sut.mn.v10_tik
cs-th:tika:sut.mn.v11_tik
cs-th:tika:sut.mn.v12_tik
cs-th:tika:sut.mn.v13_tik
cs-th:tika:sut.mn.v14_tik
cs-th:tika:sut.mn.v15_tik
cs-th:tika:sut.sn.01_tik
cs-th:tika:sut.sn.02_tik
cs-th:tika:sut.sn.03_tik
cs-th:tika:sut.sn.04_tik
cs-th:tika:sut.sn.05_tik
cs-th:tika:sut.sn.06_tik
cs-th:tika:sut.sn.07_tik
cs-th:tika:sut.sn.08_tik
cs-th:tika:sut.sn.09_tik
cs-th:tika:sut.sn.0_tik
cs-th:tika:sut.sn.10_tik
cs-th:tika:sut.sn.11_tik
cs-th:tika:sut.sn.12_tik
cs-th:tika:sut.sn.13_tik
cs-th:tika:sut.sn.14_tik
cs-th:tika:sut.sn.15_tik
cs-th:tika:sut.sn.16_tik
cs-th:tika:sut.sn.17_tik
cs-th:tika:sut.sn.18_tik
cs-th:tika:sut.sn.19_tik
cs-th:tika:sut.sn.20_tik
cs-th:tika:sut.sn.21_tik
cs-th:tika:sut.sn.22_tik
cs-th:tika:sut.sn.23_tik
cs-th:tika:sut.sn.24_tik
cs-th:tika:sut.sn.25_tik
cs-th:tika:sut.sn.26_tik
cs-th:tika:sut.sn.27_tik
cs-th:tika:sut.sn.28_tik
cs-th:tika:sut.sn.29_tik
cs-th:tika:sut.sn.30_tik
cs-th:tika:sut.sn.31_tik
cs-th:tika:sut.sn.32_tik
cs-th:tika:sut.sn.33_tik
cs-th:tika:sut.sn.34_tik
cs-th:tika:sut.sn.35_tik
cs-th:tika:sut.sn.36_tik
cs-th:tika:sut.sn.37_tik
cs-th:tika:sut.sn.38_tik
cs-th:tika:sut.sn.39_tik
cs-th:tika:sut.sn.40_tik
cs-th:tika:sut.sn.41_tik
cs-th:tika:sut.sn.42_tik
cs-th:tika:sut.sn.43_tik
cs-th:tika:sut.sn.44_tik
cs-th:tika:sut.sn.45_tik
cs-th:tika:sut.sn.46_tik
cs-th:tika:sut.sn.47_tik
cs-th:tika:sut.sn.48_tik
cs-th:tika:sut.sn.49_tik
cs-th:tika:sut.sn.50_tik
cs-th:tika:sut.sn.51_tik
cs-th:tika:sut.sn.52_tik
cs-th:tika:sut.sn.53_tik
cs-th:tika:sut.sn.54_tik
cs-th:tika:sut.sn.55_tik
cs-th:tika:sut.sn.56_tik
cs-th:tika:vin.bhi.0_dvem_tik
cs-th:tika:vin.bhi.0_kank_tik
cs-th:tika:vin.bhi.0_vima_tik
cs-th:tika:vin.bhi.v_dvem_tik
cs-th:tika:vin.bhu.0_dvem_tik
cs-th:tika:vin.bhu.0_kank_tik
cs-th:tika:vin.bhu.ni_kank_tik
cs-th:tika:vin.bhu.pc_kank_tik
cs-th:tika:vin.bhu.pr_kank_tik
cs-th:tika:vin.bhu.sg_kank_tik
cs-th:tika:vin.cv.01_sara_tik
cs-th:tika:vin.cv.01_vima_tik
cs-th:tika:vin.cv.02_sara_tik
cs-th:tika:vin.cv.02_vima_tik
cs-th:tika:vin.cv.03_sara_tik
cs-th:tika:vin.cv.03_vima_tik
cs-th:tika:vin.cv.04_sara_tik
cs-th:tika:vin.cv.04_vima_tik
cs-th:tika:vin.cv.05_sara_tik
cs-th:tika:vin.cv.05_vima_tik
cs-th:tika:vin.cv.06_sara_tik
cs-th:tika:vin.cv.06_vima_tik
cs-th:tika:vin.cv.07_sara_tik
cs-th:tika:vin.cv.07_vima_tik
cs-th:tika:vin.cv.08_sara_tik
cs-th:tika:vin.cv.08_vima_tik
cs-th:tika:vin.cv.09_sara_tik
cs-th:tika:vin.cv.09_vima_tik
cs-th:tika:vin.cv.0_paci_tik
cs-th:tika:vin.cv.0_sara_tik
cs-th:tika:vin.cv.0_vaji_tik
cs-th:tika:vin.cv.0_vima_tik
cs-th:tika:vin.cv.10_sara_tik
cs-th:tika:vin.cv.10_vima_tik
cs-th:tika:vin.cv.11_sara_tik
cs-th:tika:vin.cv.11_vima_tik
cs-th:tika:vin.cv.12_sara_tik
cs-th:tika:vin.cv.12_vima_tik
cs-th:tika:vin.kank.0_kank_tik
cs-th:tika:vin.kankha.0_dvem_tik
cs-th:tika:vin.khud.01_khud_tik
cs-th:tika:vin.khud.02_khud_tik
cs-th:tika:vin.pac.pci_vima_tik
cs-th:tika:vin.vila.08_vila_tik
cs-th:tika:vin.vila.09_vila_tik
cs-th:tika:vin.vila.10_vila_tik
cs-th:tika:vin.vila.11_vila_tik
cs-th:tika:vin.vila.12_vila_tik
cs-th:tika:vin.vila.13_vila_tik
cs-th:tika:vin.vila.14_vila_tik
cs-th:tika:vin.vila.15_vila_tik
cs-th:tika:vin.vila.16_vila_tik
cs-th:tika:vin.vila.17_vila_tik
cs-th:tika:vin.vila.18_vila_tik
cs-th:tika:vin.vila.19_vila_tik
cs-th:tika:vin.vila.20_vila_tik
cs-th:tika:vin.vila.21_vila_tik
cs-th:tika:vin.vila.22_vila_tik
cs-th:tika:vin.vila.23_vila_tik
cs-th:tika:vin.vila.24_vila_tik

_/\_

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: from: [ATI.eu] CSCD xml to ati.eu format: converting, editing
« Reply #38 on: April 01, 2019, 06:17:50 AM »
Sadhu

Atma recognized that the search is less rendered for scripts other then latin. Yet not understanding why separating certain ranges character by character.
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: [ATI.eu] Indexing and search engine issues
« Reply #39 on: April 01, 2019, 06:34:00 PM »
Atma has attached the Khmer and Thai Unicode table to possible exclude special characters like stops, computations ... breaking into single characters seems to be meaningless.

Not sure if Upasaka Vorapol may like to assist here for Thai. Atma will try to list special characters in Khmer but not sure now how the indexer or better the search engine works with compunctions and so on generally (simply cut them all away?)
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: [ATI.eu] Indexing and search engine issues
« Reply #40 on: April 01, 2019, 07:19:51 PM »
17D4 KHMER SIGN KHAN • functions as a full stop, period
(→ 0E2F   thai character paiyannoi
→ 104A   myanmar sign little section)

17D5 KHMER SIGN BARIYOOSAN • indicates the end of a section or a text
(→ 0E5A   thai character angkhankhu
→ 104B   myanmar sign section)

17D6 KHMER SIGN CAMNUC PII KUUH • functions as colon
(• the preferred transliteration is camnoc pii kuuh
→ 00F7 ÷  division sign → 0F14   tibetan mark gter tsheg )

17D7 KHMER SIGN LEK TOO • repetition sign
(→ 0E46   thai character maiyamok )

17D8 KHMER SIGN BEYYAL • et cetera
• use of this character is discouraged; other abbreviations for et cetera also exist • preferred spelling: ។ល។

17D9 KHMER SIGN PHNAEK MUAN • indicates the beginning of a book or a treatise • the preferred transliteration is phnek moan
(→ 0E4F   thai character fongman )

17DA KHMER SIGN KOOMUUT • indicates the end of a book or treatise • this forms a pair with 17D9   • the preferred transliteration is koomoot
(→ 0E5B   thai character khomut )

17DB KHMER CURRENCY SYMBOL RIEL

17E0 KHMER DIGIT ZERO
17E1 KHMER DIGIT ONE
17E2 KHMER DIGIT TWO
17E3 KHMER DIGIT THREE
17E4 KHMER DIGIT FOUR
17E5 KHMER DIGIT FIVE
17E6 KHMER DIGIT SIX
17E7 KHMER DIGIT SEVEN
17E8 KHMER DIGIT EIGHT
17E9 KHMER DIGIT NINE

0E50 THAI DIGIT ZERO
0E51 THAI DIGIT ONE
0E52 THAI DIGIT TWO
0E53 THAI DIGIT THREE
0E54 THAI DIGIT FOUR
0E55 THAI DIGIT FIVE
0E56 THAI DIGIT SIX
0E57 THAI DIGIT SEVEN
0E58 THAI DIGIT EIGHT
0E59 THAI DIGIT NINE

Word breaks are either white spaces or zero width spaces in both scripts, Khmer and Thai.
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Offline Moritz

  • Cief houskeeper / Chefhausmeister
  • Very Engaged Member
  • *
  • Sadhu! or +299/-0
  • Gender: Male
Re: [ATI.eu] Indexing and search engine issues
« Reply #41 on: April 01, 2019, 09:58:43 PM »
Sadhu, I was just reading the searching code, trying to understand a bit how it works.

Probably the breaking into single characters idea came from Chinese or Japanese, where I think characters usually represent a complete word. Maybe Thai was included by mistake, thinking that all Asian scripts use such logographic "full word" characters.

I think it is best to just remove the Thai block from the "Asian" set and treat it "normally" like Roman etc. The Khmer block (1780–17FF) is not included there either, and search is working well. Most important change needed would be probably to use zero-width spaces as separators.

not sure now how the indexer or better the search engine works with compunctions and so on generally (simply cut them all away?)
* Moritz ("punctuation" - not "compunction" = "Gewissenhaftigkeit", as Bhante Thanissaro translates "otappa")
Punctuation marks are removed during indexing, and the resulting pieces are stored as "words" in the index, along with some reference tables to store which page has how many occurrences of each word.
When searching, first the tables are searched to find all pages which contain all the single words in the search phrase. And then, if the search phrase (or parts of the search phrase) has been put into quotes "", also the whole search phrase (or the quoted parts) is matched to find the exact occurrence in the text, including punctuation marks and so on.
For example, searching for "ist den Drei Juwelen, dem Buddha, dem Dhamma, der Sangha, gewidmet" will find one result on the page http://www.accesstoinsight.eu/km/index now.
* Moritz (Strange: It should also find the same on http://www.accesstoinsight.eu/de/index) but it does not. Seems like the index is somehow incomplete again.
If leaving out one comma or one word, like "ist den Drei Juwelen Buddha, dem Dhamma, der Sangha, gewidmet", still put in quotes, no result would be found, because there is no exact match of the quotation.
However, if searching for the same without quotation marks around, results would be found again, just looking for every word, not for the whole phrase, and ignoring all punctuation.
* Moritz (Strange: Searching in this way also gives http://www.accesstoinsight.eu/de/index as a result, as it should be. But it did not find the match when searching for the whole quoted phrase.)

So it should still be possible then to find exact text passages including punctuation marks, if quoted. Although, as just seen, sometimes the search engine might not work as it should. ^-^

I think I know now how to do the necessary changes to include the Khmer and Thai punctuation marks and zero-white spaces as separators. After adding these separators, the pages would have to be re-indexed again.

I will try to do it later this week. Not today anymore.

_/\_

Offline Dhammañāṇa

  • Bhikkhu
  • Very Engaged Member
  • *
  • Sadhu! or +417/-0
  • Gender: Male
  • (Samana Johann)
  • Date of ordination/Datum der Ordination.: 20140527 Upasampadā 20240110
Re: [ATI.eu] Indexing and search engine issues
« Reply #42 on: April 01, 2019, 10:11:35 PM »
Sadhu
This post and Content has come to be by Dhamma-Dana and so is given as it       Dhamma-Dana: Johann

Tags: